Module 11 Machine Learning 101

by Patrick Boily and Jen Schellinck


Data scientists are often introduced to the field via machine learning concepts, algorithms and applications, which we introduce in this module.

In future modules (Regression and Value Estimation, Spotlight on Classification, and Spotlight on Clustering), we will discuss other technical aspects of machine learning, as well as more sophisticated algorithms (see [4], [5], [2], [196], and [3] for details).

Contents

11.1 Introduction

11.2 Statistical Learning
     11.2.1 Types of Learning
     11.2.2 DS and ML Tasks

11.3 Association Rules Mining
     11.3.1 Overview
     11.3.2 Generating Rules
     11.3.3 The A Priori Algorithm
     11.3.4 Validation
     11.3.5 Case Study: Danish Medical Data
     11.3.6 Toy Example: Titanic Dataset

11.4 Classification and Value Estimation
     11.4.1 Overview
     11.4.2 Classification Algorithms
     11.4.3 Decision Trees
     11.4.4 Performance Evaluation
     11.4.5 Case Study: Minnesota Tax Audit
     11.4.6 Toy Example: Kyphosis Dataset

11.5 Clustering
     11.5.1 Overview
     11.5.2 Clustering Algorithms
     11.5.3 \(k\)-Means
     11.5.4 Clustering Validation
     11.5.5 Case Study: Livehoods
     11.5.6 Toy Example: Iris Dataset

11.6 Issues and Challenges
     11.6.1 Bad Data
     11.6.2 Overfitting/Underfitting
     11.6.3 Appropriateness and Transferability
     11.6.4 Myths and Mistakes

11.7 R Examples
     11.7.1 ARM: Titanic Dataset
     11.7.2 Classification: Kyphosis Dataset
     11.7.3 Clustering: Iris Dataset

11.8 Exercises

References

[2]
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, 2008.
[3]
G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning: With Applications in R. Springer, 2014.
[4]
C. C. Aggarwal and C. K. Reddy, Eds., Data Clustering: Algorithms and Applications. CRC Press, 2014.
[5]
C. C. Aggarwal, Data Mining: The Textbook. Cham: Springer, 2015.
[196]
D. Barber, Bayesian Reasoning and Machine Learning. Cambridge Press, 2012.