# Module 14 Spotlight on Clustering

by Patrick Boily and Jen Schellinck, with contributions from Aditya Maheshwari

In Machine Learning 101, we provided a (basically) math-free general overview of machine learning.

Supervised learning methods can be presented in a formalism which generalizes statistical and regression analysis, and their performance are easy to evaluate; consequently, they have been studied extensively and often form the backbone of machine learning training.

On the other hand, apart from a select few classical models, unsupervised learning tasks are not usually presented with quite the same depth, primarily due to the vagueness which infect their core – a number of the important concepts are ambiguously defined; the validation of the results is often elusive, and the actionable applications of the outcomes are not usually clear.

The interest in such methods and tasks (clustering and segmentation, association rules mining, link profiling, etc.) is mounting, however, with the increased interest in artificial intelligence and machine learning research. In this module, we describe various clustering algorithms, and discuss related issues and challenges.

This is a continuation of the treatment provided in Regression and Value Estimation and Spotlight on Classification.

### Contents

14.1 Overview
14.1.1 Unsupervised Learning
14.1.2 Clustering Framework
14.1.3 A Philosophical Approach to Clustering

14.2 Simple Clustering Algorithms
14.2.1 $$k-$$Means and Variants
14.2.2 Hierarchical Clustering

14.3 Clustering Evaluation
14.3.1 Clustering Assessment
14.3.2 Model Selection

14.4 Advanced Clustering Methods
14.4.1 Density-Based Clustering
14.4.2 Spectral Clustering
14.4.3 Probability-Based Clustering
14.4.4 Affinity Propagation
14.4.5 Fuzzy Clustering
14.4.6 Cluster Ensembles

14.5 Exercises