Module 12 Regression and Value Estimation

by Patrick Boily


In Machine Learning 101, we provided a (basically) math-free general overview of machine learning.

In this module, we present an introductory mathematical treatment of the discipline, with a focus on regression and value estimation methods (in particular, on parametric methods).

Our approach borrows heavily from [2], [3]; explanations and examples are also available in [236].

We will continue the treatment in Spotlight on Classification and Spotlight on Clustering.

Contents

12.1 Statistical Learning
     12.1.1 Supervised Learning Framework
     12.1.2 Systematic Component and Regression
     12.1.3 Model Evaluation
     12.1.4 Bias-Variance Trade-Off

12.2 Regression Modeling
     12.2.1 Formalism
     12.2.2 Least Squares Properties
     12.2.3 Generalizations of OLS
     12.2.4 Shrinkage Methods

12.3 Resampling Methods
     12.3.1 Cross-Validation
     12.3.2 Bootstrap
     12.3.3 Jackknife

12.4 Model Selection
     12.4.1 Best Subset Selection
     12.4.2 Stepwise Selection
     12.4.3 Selecting the Optimal Model

12.5 Nonlinear Modeling
     12.5.1 Basis Function Models
     12.5.2 Splines
     12.5.3 Generalized Additive Models

12.6 Example: Algae Blooms
     12.6.1 Value Estimation Modeling
     12.6.2 Model Evaluation
     12.6.3 Model Predictions

12.7 Exercises

References

[2]
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, 2008.
[3]
G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning: With Applications in R. Springer, 2014.
[236]
B. Boehmke and B. Greenwell, Hands on Machine Learning with R. CRC Press.