Module 15 Feature Selection and Dimension Reduction
by Patrick Boily, with contributions from Olivier Leduc, Andrew Macfie, Aditya Maheshwari, and Maia Pelletier
Data mining is the collection of processes by which we can extract useful insights from data. Inherent in this definition is the idea of data reduction: useful insights (whether in the form of summaries, sentiment analyses, etc.) ought to be “smaller” and “more organized” than the original raw data.
The challenges presented by high data dimensionality (the so-called curse of dimensionality) must be addressed in order to achieve insightful and interpretable analytical results.
In this module, we introduce the basic principles of dimensionality reduction and a number of feature selection methods (filter, wrapper, regularization), and we discuss some advanced topics (SVD, spectral feature selection, UMAP).