Module 4 Introductory Statistical Analysis

by Patrick Boily, with contributions from Rafal Kulik and Shintaro Hagiwara

Loosely speaking, a statistic is any function of a sample from the distribution of a random variable; statistics aim to extract information from an observed sample to summarize the essential features of a dataset.

In this module, we introduce the basics of statistics, and we show how probability theory can be used to build confidence intervals and conduct hypothesis tests, two of the fundamental tasks of statistical analysis. We also discuss various variance decompositions and multivariate statistics.

Our review of statistical methods is by necessity quite brief; further details can be found in [33], [34], [35], [36], [37], [38], [39], [31], [38], and [32]. A fair number of the examples we provide in the rest of the module also come from those references.


4.1 Introduction

4.2 Descriptive Statistics
     4.2.1 Data Descriptions
     4.2.2 Outliers
     4.2.3 Visual Summaries
     4.2.4 Coefficient of Correlation

4.3 Point and Interval Estimation
     4.3.1 Standard Error
     4.3.2 C.I. for \(\mu\) When \(\sigma\) is Known
     4.3.3 Confidence Level
     4.3.4 Sample Size
     4.3.5 C.I. for \(\mu\) When \(\sigma\) is Unknown
     4.3.6 C.I. for a Proportion

4.4 Hypothesis Testing
     4.4.1 Hypothesis Testing in General
     4.4.2 Test Statistics and Critical Regions
     4.4.3 Test for a Mean
     4.4.4 Test for a Proportion
     4.4.5 Two-Sample Tests
     4.4.6 Difference of Two Proportions
     4.4.7 Hypothesis Testing with R

4.5 Additional Topics
     4.5.1 Analysis of Variance
     4.5.2 Analysis of Covariance
     4.5.3 Basics of Multivariate Statistics
     4.5.4 Goodness-of-Test Fits

4.6 Exercises


R. E. Walpole, R. H. Myers, S. L. Myers, and K. Ye, Probability and Statistics for Engineers and Scientists, 8th ed. Pearson Education, 2007.
R. V. Hogg and E. A. Tanis, Probability and Statistical Inference, 7th ed. Pearson/Prentice Hall, 2006.
H. Sahai and M. I. Ageel, The Analysis of Variance: Fixed, Random and Mixed Models. Birkhäuser, 2000.
M. H. Kutner, C. J. Nachtsheim, J. Neter, and W. Li, Applied Linear Statistical Models. McGraw Hill Irwin, 2004.
M. Hollander and D. A. Wolfe, Nonparametric Statistical Methods, 2nd ed. Wiley, 1999.
P. Bruce and A. Bruce, Practical Statistics for Data Scientists: 50 Essential Concepts. O’Reilly, 2017.
D. S. Sivia and J. Skilling, Data Analysis: A Bayesian Tutorial (2nd ed.). Oxford Science, 2006.
M. L. Rizzo, Statistical Computing with R. CRC Press, 2007.
A. Reinhart, Statistics Done Wrong: the Woefully Complete Guide. No Starch Press, 2015.