Module 14 Spotlight on Clustering

by Patrick Boily and Jen Schellinck, with contributions from Aditya Maheshwari

In Machine Learning 101, we provided a (basically) math-free general overview of machine learning.

Supervised learning methods can be presented in a formalism which generalizes statistical and regression analysis, and their performance are easy to evaluate; consequently, they have been studied extensively and often form the backbone of machine learning training.

On the other hand, apart from a select few classical models, unsupervised learning tasks are not usually presented with quite the same depth, primarily due to the vagueness which infect their core – a number of the important concepts are ambiguously defined; the validation of the results is often elusive, and the actionable applications of the outcomes are not usually clear.

The interest in such methods and tasks (clustering and segmentation, association rules mining, link profiling, etc.) is mounting, however, with the increased interest in artificial intelligence and machine learning research. In this module, we describe various clustering algorithms, and discuss related issues and challenges.

This is a continuation of the treatment provided in Regression and Value Estimation and Spotlight on Classification.


14.1 Overview
     14.1.1 Unsupervised Learning
     14.1.2 Clustering Framework
     14.1.3 A Philosophical Approach to Clustering

14.2 Simple Clustering Algorithms
     14.2.1 \(k-\)Means and Variants
     14.2.2 Hierarchical Clustering

14.3 Clustering Evaluation
     14.3.1 Clustering Assessment
     14.3.2 Model Selection

14.4 Advanced Clustering Methods
     14.4.1 Density-Based Clustering
     14.4.2 Spectral Clustering
     14.4.3 Probability-Based Clustering
     14.4.4 Affinity Propagation
     14.4.5 Fuzzy Clustering
     14.4.6 Cluster Ensembles

14.5 Exercises