Module 13 Spotlight on Classification

by Patrick Boily, with contributions from Olivier Leduc and Shintaro Hagiwara


In Machine Learning 101, we provided a (basically) math-free general overview of machine learning. In this module, we present an introductory mathematical treatment of the discipline, with a focus on classification, ensemble learning, and non-parametric supervised methods.

Our approach once again borrows heavily from [2], [3]; explanations and examples are also available in [236], [242].

This is a continuation of the treatment provided in Regression and Value Estimation and a companion piece to Spotlight on Clustering.

Contents

13.1 Overview
     13.1.1 Formalism
     13.1.2 Model Evaluation
     13.1.3 Bias-Variance Trade-Off

13.2 Simple Classification Methods
     13.2.1 Logistic Regression
     13.2.2 Discriminant Analysis
     13.2.3 ROC Curve

13.3 Rare Occurrences

13.4 Other Supervised Approaches
     13.4.1 Tree-Based Methods
     13.4.2 Support Vector Machines
     13.4.3 Artificial Neural Networks
     13.4.4 Naïve Bayes Classifiers

13.5 Ensemble Learning
     13.5.1 Bagging
     13.5.2 Random Forests
     13.5.3 Boosting

13.6 Exercises

References

[2]
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, 2008.
[3]
G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning: With Applications in R. Springer, 2014.
[236]
B. Boehmke and B. Greenwell, Hands on Machine Learning with R. CRC Press.
[242]
F. Chollet, Deep Learning with Python, 1st ed. USA: Manning Publications Co., 2017.