Module 13 Spotlight on Classification

by Patrick Boily, with contributions from Olivier Leduc and Shintaro Hagiwara

In Machine Learning 101, we provided a (basically) math-free general overview of machine learning. In this module, we present an introductory mathematical treatment of the discipline, with a focus on classification, ensemble learning, and non-parametric supervised methods.

Our approach once again borrows heavily from [2], [3]; explanations and examples are also available in [236], [242].

This is a continuation of the treatment provided in Regression and Value Estimation and a companion piece to Spotlight on Clustering.


13.1 Overview
     13.1.1 Formalism
     13.1.2 Model Evaluation
     13.1.3 Bias-Variance Trade-Off

13.2 Simple Classification Methods
     13.2.1 Logistic Regression
     13.2.2 Discriminant Analysis
     13.2.3 ROC Curve

13.3 Rare Occurrences

13.4 Other Supervised Approaches
     13.4.1 Tree-Based Methods
     13.4.2 Support Vector Machines
     13.4.3 Artificial Neural Networks
     13.4.4 Naïve Bayes Classifiers

13.5 Ensemble Learning
     13.5.1 Bagging
     13.5.2 Random Forests
     13.5.3 Boosting

13.6 Exercises


T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, 2008.
G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning: With Applications in R. Springer, 2014.
B. Boehmke and B. Greenwell, Hands on Machine Learning with R. CRC Press.
F. Chollet, Deep Learning with Python, 1st ed. USA: Manning Publications Co., 2017.