3.5 Central Limit Theorem and Sampling Distributions

In this section, we introduce one of the fundamental results of probability theory and statistical analysis.

3.5.1 Sampling Distributions

A population is a set of similar items which of interest in relation to some questions or experiments.

In some situations, it is impossible to observe the entire set of observations that make up a population – perhaps the entire population is too large to query, or some units are out-of-reach.

In these cases, we can only hope to infer the behaviour of the entire population by considering a sample (subset) of the population.

Suppose that \(X_1,\ldots,X_n\) are \(n\) independent random variables, each having the same c.d.f. \(F\), i.e.they are identically distributed. Then, \(\{X_1,\ldots,X_n\}\) is a random sample of size \(n\) from the population, with c.d.f. \(F\,.\)

Any function of such a random sample is called a statistic of the sample; the probability distribution of a statistic is called a sampling distribution.

Recall the linear properties of the expectation and the variance: if \(X\) is a random variable and \(a,b\in \mathbb{R}\), then \[\begin{aligned} \text{E}\left[ a+bX \right]&= a+b\text{E}[X]\,,\\ \text{Var}\left[ a+bX \right]&= b^2\text{Var}[X]\,,\\ \text{SD}\left[ a+bX \right]&=|b|\text{SD}[X]\,.\end{aligned}\]

Sum of Independent Random Variables

For any random variables \(X\) and \(Y\), we have \[\text{E}[X+Y]= \text{E}[X]+\text{E}[Y].\] In general, \[\text{Var}[X+Y]=\text{Var}[X]+2\text{Cov}(X,Y)+\text{Var}[Y];\] if in addition \(X\) and \(Y\) are independent, then \[\text{Var}[X+Y]=\text{Var}[X]+\text{Var}[Y].\] More generally, if \(X_1,X_2,\ldots,X_n\) are independent, then \[\text{E}\left[ \sum_{i=1}^nX_i \right]=\sum_{i=1}^n \text{E}[X_i] \quad\text{and}\quad \text{Var}\left[ \sum_{i=1}^nX_i \right]=\sum_{i=1}^n \text{Var}[X_i]\,.\]

Independent and Identically Distributed Random Variables

A special case of the above occurs when all of \(X_1,\ldots,X_n\) have exactly the same distribution. In that case we say they are independent and identically distributed, which is traditionally abbreviated to “iid”. If \(X_1,\ldots,X_n\) are iid, and \[\begin{aligned} \text{E}\left[ X_i \right] = \mu&\quad\text{and}&\text{Var}\left[ X_i \right]=\sigma^2 \quad\text{for }i=1,\ldots,n,\end{aligned}\] then \[\begin{aligned} \text{E}\left[ \sum_{i=1}^nX_i \right]=n\mu&\quad\text{and}& \text{Var}\left[ \sum_{i=1}^n X_i \right]=n\sigma^2\,.\end{aligned}\]


Examples

  • A random sample of size \(100\) is taken from a population with mean \(50\) and variance \(0.25\). Find the expected value and variance of the sample total.

    Answer: this problem translates to “if \(X_1,\ldots,X_{100}\) are iid with \(\text{E}[X_i]=\mu=50\) and \(\text{Var}[X]=\sigma^2=0.25\) for \(i=1,\ldots,100\), find \(\text{E}\left[\tau\right]\) and \(\text{Var}\left[\tau \right]\) for \[\tau=\sum_{i=1}^nX_i.\text{''}\] According to the iid formulas, \[\begin{aligned} \text{E}\left[\sum_{i=1}^nX_i \right]&= 100\mu=5000 \\ \text{Var}\left[\sum_{i=1}^nX_i \right] &= 100\sigma^2=25\,.\end{aligned}\]

  • The mean value of potting mix bags weights is \(5\) kg, with standard deviation \(0.2\). If a shop assistant carries \(4\) bags (selected independently from stock) then what is the expected value and standard deviation of the total weight carried?

    Answer: there is an implicit “population” of bag weights. Let \(X_1,X_2,X_3,X_4\) be iid with \(\text{E}[X_i]=\mu=5\), \(\text{SD}[X_i]=\sigma=0.2\) and \(\text{Var}[X_i]=\sigma^2=0.2^2 = 0.04\) for \(i=1,2,3,4\). Let \(\tau=X_1+X_2+X_3+X_4\).

    According to the iid formulas, \[\begin{aligned} \text{E}[\tau]&= n\mu=4\cdot 5=20 \\ \text{Var}[\tau]&=n\sigma^2=4\cdot0.04=0.16.\end{aligned}\] Thus, \(\text{SD}[\tau]=\sqrt{0.16}=0.4\).

Sample Mean

The sample mean is a typical statistic of interest: \[\overline{X} = \frac1n\sum_{i=1}^nX_i\,.\] If \(X_1, \ldots, X_n\) are iid with \(\text{E}[X_i]=\mu\) and \(\text{Var}[X_i]=\sigma^2\) for all \(i=1,\ldots, n\), then \[\begin{aligned} \text{E}\left[ \overline{X} \right]&= \text{E}\left[ \frac1n \sum_{i=1}^nX_i \right]=\frac1n\text{E}\left[\sum_{i=1}^nX_i \right]=\frac1n\left( n\mu \right)=\mu\, \\ \text{Var}\left[ \overline{X} \right]&= \text{Var}\left[ \frac1n \sum_{i=1}^nX_i \right]= \frac{1}{n^2}\text{Var}\left[\sum_{i=1}^nX_i \right]=\frac{1}{n^2}\left( n\sigma^2 \right)=\frac{\sigma^2}n\,.\end{aligned}\]


Example: a set of scales returns the true weight of the object being weighed plus a random error with mean \(0\) and standard deviation \(0.1\) g. Find the standard deviation of the average of \(9\) such measurements of an object.

Answer: suppose the object has true weight \(\mu\). The “random error” indicates that each measurement \(i=1,\ldots, 9\) is written as \(X_i=\mu+Z_i\) where \(\text{E}[Z_i]=0\) and \(\text{SD}[Z_i]=0.1\) and the \(Z_i\)’s are iid.

The \(X_i\)’s are iid with \(\text{E}[X_i]=\mu\) and \(\text{SD}[X_i]=\sigma=0.1\). If we average \(X_1,\ldots,X_n\) (with \(n=9\)) to get \(\overline{X}\), then \[\begin{aligned} \text{E}\left[ \overline{X} \right] = \mu~~~\text{and}~~~ \textstyle \text{SD}\left[ \overline{X} \right]=\frac{\sigma}{\sqrt n} = \frac{0.1}{\sqrt{9}}=\frac1{30}\approx0.033\,.\end{aligned}\] We do not need to know the actual distribution of the \(X_i\); only \(\mu\) and \(\sigma^2\) are required to compute \(\text{E}[\overline{X}]\) and \(\text{Var}[\overline{X}]\).

Sum of Independent Normal Random Variables

Another interesting case occurs when we have multiple independent normal random variables on the same experiment.

Suppose \(X_i\sim\mathcal{N}\left(\mu_i,\sigma_i^2\right)\) for \(i=1,\ldots,n\), and all the \(X_i\) are independent. We already know that \[\begin{aligned} \text{E}[\tau]&= \text{E}[X_1+\cdots+X_n]=\text{E}[X_1]+\cdots+\text{E}[X_n]=\mu_1+\cdots+\mu_n\,; \\ \text{Var}[\tau]&=\text{Var}[X_1+\cdots+X_n]=\text{Var}[X_1]+\cdots+\text{Var}[X_n]=\sigma^2_1+\cdots+\sigma^2_n\,.\end{aligned}\] It turns out that, under these hypotheses, \(\tau\) is also normally distributed, i.e. \[{\tau=\sum_{i=1}^nX_i \sim\mathcal{N}(\text{E}[\tau],\text{Var}[\tau])=\mathcal{N}\left( \mu_1+\cdots+\mu_n,\sigma_1^2+\cdots+\sigma_n^2 \right)}.\] Thus, if \(\{X_1,\ldots,X_n\}\) is a random sample from a normal population with mean \(\mu\) and variance \(\sigma^2\), then \(\sum_{i=1}^nX_i\) and \(\overline{X}\) are also normal, which, combined with the above work, means that \[\begin{aligned} \sum_{i=1}^nX_i\sim\mathcal{N}\left( n\mu,n\sigma^2 \right)&\quad\text{and}& \overline{X}\sim\mathcal{N}\left( \mu,\frac{\sigma^2}n \right)\,.\end{aligned}\]


Example: suppose that the population of students’ weights is normal with mean \(75\) kg and standard deviation \(5\) kg. If \(16\) students are picked at random, what is the distribution of the (random) total weight \(\tau\)? What is the probability that the total weight exceeds \(1250\) kg?

Answer: If \(X_1,\ldots,X_{16}\) are iid as \(\mathcal{N}(75,25)\), then the sum \(\tau=X_1+\cdots + X_{16}\) is also normally distributed with \[\tau=\sum_{i=1}^{16}X_i\sim\mathcal{N}(16\cdot 75,16\cdot 25)=\mathcal{N}(1200,400),\quad\text{and}\] \[Z=\frac{\tau -1200}{\sqrt{400}}\sim\mathcal{N}(0,1).\] Thus, \[SD1\] \[\begin{aligned} P(\tau >1250) &= P\left( \frac{\tau -1200}{\sqrt{400}}>\frac{1250-1200}{20} \right)\\&=P(Z>2.5)=1-P(Z\leq2.5)\\&\approx1-0.9938=0.0062\,.\end{aligned}\]

3.5.2 Central Limit Theorem

Suppose that a professor has been teaching a course for the last 20 years. For every cohort during that period, the mid-term exam grades of all the students have been recorded. Let \(X_{i,j}\) be the grade of student \(i\) in year \(j\). Looking back on the class lists, they find that \[\text{E}[X_{i,j}]=56\quad\mbox{and} \quad \text{SD}[X_{i,j}]=11.\] This year, there are \(49\) students in the class. What should the professor expect for the class mid-term exam average?

Of course, the professor cannot predict any of the student grades or the class average with absolute certainty, but they could try the following approach:

  1. simulate the results of the class of \(49\) students by generating sample grades \(X_{1,1},\ldots, X_{1,49}\) from a normal distribution \(\mathcal{N}(65,15^2)\);

  2. compute the sample mean for the sample and record it as \(\overline{X}_1\);

  3. repeat steps 1-2 \(m\) times and compute the standard deviation of the sample means \(\overline{X}_1,\ldots,\overline{X}_m\);

  4. plot the histogram of the sample means \(\overline{X}_1,\ldots,\overline{X}_m\).

What do you think is going to happen?

Central Limit Theorem: If \(\overline{X}\) is the mean of a random sample of size \(n\) taken from an unknown population with mean \(\mu\) and finite variance \(\sigma^2\,,\) then \[Z=\frac{\overline{X}-\mu}{\sigma/\sqrt{n}}\sim \mathcal{N}(0,1),\] as \(n\to\infty\).

More precisely, this is a limiting result. If we view the standardization \[Z_n =\frac{\overline{X} - \mu}{\sigma/\sqrt n},\] as functions of \(n\), we have, for each \(z\), \[\begin{aligned} \lim_{n\to\infty}P\left( Z_n\leq z \right)&=\Phi(z)~~~\text{and}\\ P\left( Z_n\leq z \right)&\approx \Phi(z), \text{ if $n$ is large enough},\end{aligned}\] whether the original \(X_i\)’s are normal or not.

Illustration of the central limit theorem with a normal underlying distribution and with an exponential underlying distribution.Illustration of the central limit theorem with a normal underlying distribution and with an exponential underlying distribution.

Figure 3.20: Illustration of the central limit theorem with a normal underlying distribution and with an exponential underlying distribution [source unknown].


Examples

  • The examination scores in an university course have mean \(56\) and standard deviation \(11\). In a class of \(49\) students, what is the probability that the average mark is below \(50\)? What is the probability that the average mark lies between \(50\) and \(60\)?

    Answer: let the marks be \(X_1,..., X_{49}\) and assume the performances are independent. According to the central limit theorem, \[\overline{X} = (X_1 + X_2 + \cdots + X_{49})/49,\] with \(\text{E}[\overline{X}]=56\) and \(\text{Var}[\overline{X}]= 11^2 /49\). We thus have \[\begin{aligned} P(\overline{X} < 50) &\approx P\left(Z < \frac{50 -56}{11/7}\right)\\&= P(Z < -3.82) = 0.0001\end{aligned}\] and \[\begin{aligned} P(50 < \overline{X} < 60) &\approx P\left( \frac{50 -56}{11/7} < Z < \frac{60-56}{11/7}\right)\\ &= P( -3.82 < Z < 2.55)\\&=\Phi(2.55) - \Phi(-3.82)\\&= 0.9945.\end{aligned}\] Note that this says nothing about whether the scores are normally distributed or not, only that the average scores follow an approximate normal distribution.34

  • Systolic blood pressure readings for pre-menopausal, non-pregnant women aged \(35 - 40\) have mean \(122.6\) standard deviation \(11\) mm Hg. An independent sample of \(25\) women is drawn from this target population and their blood pressure is recorded.

    What is the probability that the average blood pressure is greater than \(125\) mm Hg? How would the answer change if the sample size increases to \(40\)?

    Answer: according to the CLT, \(\overline{X} \sim \mathcal{N}( 122.6, 121/25)\), approximately. Thus \[\begin{aligned} P(\overline{X} > 125) & \approx P\left(Z > \frac{125-122.6}{11/\sqrt{25}}\right)\\ & = P(Z > 1.09) = 1-\Phi(1.09)\\&=0.14.\end{aligned}\] However, if the sample size is \(40\), then \[\begin{aligned} P(\overline{X} > 125) & \approx P\left(Z > \frac{125-122.6}{11/\sqrt{40}}\right)=0.08.\end{aligned}\] Increasing the sample size reduces the probability that the average is far from the expectation of each original measurement.

  • Suppose that we select a random sample \(X_1,\ldots,X_{100}\) from a population with mean \(5\) and variance \(0.01\).

    What is the probability that the difference between the sample mean of the random sample and the mean of the population exceeds \(0.027\)?

    Answer: according to the CLT, we know that, approximately, \(Z=\frac{\overline{X}-\mu}{\sigma/\sqrt{n}}\) has standard normal distribution. The desired probability is thus \[\begin{aligned} P&=P(|\overline{X}-\mu|\ge 0.027)\\&=P(\overline{X}-\mu\ge 0.027 \text{ or }\mu-\overline{X}\ge 0.027)\\&=P\left(\frac{\overline{X}-5}{0.1/\sqrt{100}}\ge \frac{0.027}{0.1/\sqrt{100}}\right)\\ & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ +P\left(\frac{\overline{X}-5}{0.1/\sqrt{100}}\le \frac{-0.027}{0.1/\sqrt{100}}\right)\\ &\approx P\left(Z\ge 2.7\right)+P\left(Z\le -2.7\right)\\&=2P\left(Z\ge 2.7\right)\approx 2(0.0035)=0.007.\end{aligned}\]

In the next example, we illustrate how to use the Central Limit Theorem with R.


Example: a large freight elevator can transport a maximum of 9800 lbs. Suppose a load containing 49 boxes must be transported. From experience, the weight of boxes follows a distribution with mean \(\mu=205\) lbs and standard deviation \(\sigma=15\) lbs. Estimate the probability that all 49 boxes can be safely loaded onto the freight elevator and transported.

Solution: we are given \(n=49\), \(\mu=205\), and \(\sigma=15\). Let us further assume that the boxes all come from different sources (i.e. the boxes’ weight \(x_i\), \(i=1,\ldots,49\), are independent of one another)

To get a sense of the task’s feasibility, we simulate a few scenarios. Note that the problem makes no mention of the type of distribution that the weights follow.

To start, we assume that the weights are normally distributed.

set.seed(0) # to ensure replicability
x<-rnorm(49,mean=205,sd=15)

The histogram shows a distribution which is roughly normal.

brks = seq(min(x),max(x),(max(x)-min(x))/10) 
hist(x, breaks = brks)

The elevator can transport up to 9800 lbs; the \(n=49\) boxes can be transported if their total weight \[T=49w=x_1+\cdots +x_{49},\] where \(w=\overline{x}\), is less than 9800 lbs. In mathematical terms, we are interested in the value of the probability \(P(T<9800)\).

For the sample x from above, we get:

(T<-sum(x))
[1] 10066.36

and so that specific group of 49 boxes would be too heavy to carry in one trip.

Perhaps we were simply unlucky – perhaps another group of boxes would have been light enough. Let us try again, but with a different group of boxes.

set.seed(999) 
(T=sum(rnorm(49,mean=205,sd=15)))
[1] 9852.269

It’s closer, but still no cigar. However, two tries are not enough to establish a trend and to estimate \(P(T<9800)\).

We write a little function to help us find an estimate of the probability. The idea is simple: if we were to try a large number of random combinations of 49 boxes, the proportion of the attempts for which the total weight \(T\) falls below 9800 is (hopefully?) going to approximate \(P(T<9800)\).

estimate_T.normal <- function(n, T.threshold, mean, sd, num.tries){
  a=0
  for(j in 1:num.tries){
    if(sum(rnorm(n,mean=mean,sd=sd))<T.threshold){
      a=a+1
    }
  }
  estimate_T.normal <- a/num.tries
}

What kind of inputs are these meant to be? What does this code do? Note that running this cell will compile the function estimate_T.normal(), but that it still needs to be called with appropriate inputs to provide an estimate for \(P(T<9800)\).

We try the experiment (num.tries) 10, 100, 1000, 10000, 100000, and 1000000 times, with n=49, T.threshold=9800, mu=205, and sigma=15.

(c(estimate_T.normal(49,9800,205,15,10),
estimate_T.normal(49,9800,205,15,100),
estimate_T.normal(49,9800,205,15,1000),
estimate_T.normal(49,9800,205,15,10000),
estimate_T.normal(49,9800,205,15,100000),
estimate_T.normal(49,9800,205,15,1000000)))
[1] 0.00000 0.01000 0.00700 0.00990 0.00973 0.00975

We cannot say too much from such a simple set up, but it certainly seems as though we should expect success about \(1\%\) of the time.

That is a low probability, which suggests that 49 is too many boxes for the elevator to work correctly, in general, but perhaps that is only the case because we assumed normality. What happens if we used other distributions with the same characteristics, such as \(U(179.02,230.98)\) or \(\Lambda(5.32,0.0054)\)?35

Let us write new functions estimate_T.unif() and estimate_T.lnormf() to repeat the previous work with those two distributions.

estimate_T.unif <- function(n, T.threshold, min, max, num.tries){
  a=0
  for(j in 1:num.tries){
    if(sum(runif(n,min=min,max=max))<T.threshold){
      a=a+1
    }
  }
estimate_T.unif <- a/num.tries
}

estimate_T.lnorm <- function(n, T.threshold, meanlog, sdlog, num.tries){
  a=0
  for(j in 1:num.tries){
    if(sum(rlnorm(n,meanlog=meanlog,sdlog=sdlog))<T.threshold){
      a=a+1
    }
  }
estimate_T.lnorm <- a/num.tries
}

For the uniform distribution, we obtain:

(c(estimate_T.unif(49,9800,179.0192379,230.9807621,10), 
 estimate_T.unif(49,9800,179.0192379,230.9807621,100),
 estimate_T.unif(49,9800,179.0192379,230.9807621,1000),
 estimate_T.unif(49,9800,179.0192379,230.9807621,10000),
 estimate_T.unif(49,9800,179.0192379,230.9807621,100000),
 estimate_T.unif(49,9800,179.0192379,230.9807621,1000000)))
[1] 0.000000 0.010000 0.008000 0.007900 0.010230 0.009613

For the log-normal distribution, we obtain:

(c(estimate_T.lnorm(49,9800,5.320340142,sqrt(0.005339673624),10), 
 estimate_T.lnorm(49,9800,5.320340142,sqrt(0.005339673624),100),
 estimate_T.lnorm(49,9800,5.320340142,sqrt(0.005339673624),1000),
 estimate_T.lnorm(49,9800,5.320340142,sqrt(0.005339673624),10000),
 estimate_T.lnorm(49,9800,5.320340142,sqrt(0.005339673624),100000),
 estimate_T.lnorm(49,9800,5.320340142,sqrt(0.005339673624),1000000)))
[1] 0.000000 0.000000 0.006000 0.009500 0.009060 0.009184

Under all three distributions, it appears as though \(P(T<9800)\) converges to a value near \(1\%\), even though the three distributions are very different. That might be surprising at first glance, but it is really a consequence of the Central Limit Theorem.

In effect, we are interested in estimating \(P(T<9800)=P(w<9800/49)=P(w<200)\), where \(w\) is the mean weight of the boxes.

According to the CLT, the distribution of \(w\) is approximately normal with mean \(\mu=205\) and variance \(\sigma^2/n=15^2/49\), even if the weights themselves were not normally distributed.

By subtracting the mean of \(w\) and dividing by the standard deviation we obtain a new random variable \(z\) which is approximately the standard unit normal, i.e. \[P(w<200)\approx P\left(z<\frac{200−205}{15/7}\right).\] But

(200-205)/(15/7)
[1] -2.333333

Thus, \(P(w<200)\approx P(z<−2.33)\) and we need to find the probability that the standard normal p.d.f. is smaller than \(-2.33\).

This can be calculated with the pnorm() function:

pnorm(-2.33, mean=0, sd=1)
[1] 0.009903076

Hence, \(P(T<9800)\approx 0.0099\), which means that it is highly unlikely that the 49 boxes can be transported in the elevator all at once.


Example: what elevator threshold would be required to reach a probability of success of \(10\%\)? \(50\%\)? \(75\%\)?

Answer: the following routine approximates the probability in question without resorting to simulating the weights (that is, independently of the underlying distribution of weights) for given n, threshold, mean, and sd. Can you figure out what pnorm() is doing?

prob_T <- function(n,threshold,mean,sd){
  prob_T=pnorm((threshold/n - mean)/(sd/sqrt(n)),0,1)
}

plot((prob_T(49,1:12000,205,15)))

We can find the desired thresholds by calling:

max(which(prob_T(49,1:12000,205,15)<0.1))
max(which(prob_T(49,1:12000,205,15)<0.5))
max(which(prob_T(49,1:12000,205,15)<0.75))
[1] 9910
[1] 10044
[1] 10115

3.5.3 Sampling Distributions (Reprise)

We now revisit sampling distributions.

Difference Between Two Means

Statisticians are often interested in the difference between various populations; a result akin to the central limit theorem provides guidance in that area.

Theorem: Let \(\{X_1,\ldots,X_n\)} be a random sample from a population with mean \(\mu_1\) and variance \(\sigma_1^2\), and \(\{Y_1,\ldots,Y_m\}\) be another random sample, independent of \(X\), from a population with mean \(\mu_2\) and variance \(\sigma_2^2\).

If \(\overline{X}\) and \(\overline{Y}\) are the respective sample means, then \[Z=\frac{\overline{X}-\overline{Y}-(\mu_1-\mu_2)}{\sqrt{\frac{\sigma_1^2}{n}+\frac{\sigma_2^2}{m}}}\] has standard normal distribution \(\mathcal{N}(0,1)\) as \(n,m\to\infty\).36


Example: two different machines are used to fill cereal boxes on an assembly line. The critical measurement influenced by these machines is the weight of the product in the boxes.

The variances of these weights is identical, \(\sigma^2=1\). Each machine produces a sample of \(36\) boxes, and the weights are recorded. What is the probability that the difference between the respective averages is less than \(0.2\), assuming that the true means are identical?

we have \(\mu_1=\mu_2\), \(\sigma_1^2=\sigma_2^2=1\), and \(n=m=36\). The desired probability is \[\begin{aligned} P&\left(|\overline{X}-\overline{Y}|<0.2\right)=P\left(-0.2<\overline{X}-\overline{Y}<0.2\right)\\&=P\left(\frac{-0.2-0}{\sqrt{1/36+1/36}}<\frac{\overline{X}-\overline{Y}-(\mu_1-\mu_2)}{\sqrt{1/36+1/36}}<\frac{0.2-0}{\sqrt{1/36+1/36}}\right)\\ &=P(-0.8485<Z<0.8485)\\&\approx\Phi(0.8485)-\Phi(-0.8485)\approx 0.6.\end{aligned}\]

Sample Variance \(S^2\)

When the underlying variance is unknown (which is usually the case in practice), it must be approximated by the sample variance.

Theorem: Let \(\{X_1,\ldots,X_n\}\) be a random sample taken from a normal population with mean \(\sigma^2\), and \[S^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\overline{X})^2\] be the sample variance. The statistic \[\chi^2=\frac{(n-1)S^2}{\sigma^2}=\sum_{i=1}^n\frac{(X_i-\overline{X})^2}{\sigma^2}\] follows a chi-squared distribution with \(\nu=n-1\) degrees of freedom (d.f.), where \(\chi^2(\nu)=\Gamma(1/2,\nu)\).

Chi-squared distribution with 8 degrees of freedom.

Figure 3.21: Chi-squared distribution with 8 degrees of freedom [32].

Notation: for \(0<\alpha<1\) and \(\nu \in \mathbb{N}^*\), \(\chi^2_\alpha(\nu)\) is the critical value for which \[P(\chi^2>\chi^2_\alpha(\nu))=\alpha\,,\] where \(\chi^2\sim \chi^2(\nu)\) follows a chi-squared distribution with \(\nu\) degrees of freedom.

The values of \(\chi^2_\alpha(\nu)\) can be found in various textbook tables, or by using R or specialized online calculators.

For instance, when \(\nu=8\) and \(\alpha=0.95\), we compute \(\chi^2_{0.95}(8)\) via

qchisq(0.95, df=8,lower.tail = FALSE)
[1] 2.732637

so that \(P(\chi^2>2.732)=0.95\,,\) where \(\chi^2\sim \chi^2(8)\), i.e., \(\chi^2\) has a chi-squared distribution with \(\nu=8\) degrees of freedom.

In other words, \(95\%\) of the area under the curve of the probability density function of \(\chi^2(8)\) is found to the right of \(2.732\).

Sample Mean With Unknown Population Variance

Suppose that \(Z\sim \mathcal{N}(0,1)\) and \(V\sim \chi^2(\nu)\). If \(Z\) and \(V\) are independent, then the distribution of the random variable \[T=\frac{Z}{\sqrt{V/\nu}}\] is a Student \(t-\)distribution with \(\nu\) degrees of freedom, which we denote by \(T\sim t(\nu)\).37

Theorem: let \(X_1,\ldots,X_n\) be independent normal random variables with mean \(\mu\) and standard deviation \(\sigma\,.\) Let \(\overline{X}\) and \(S^2\) be the sample mean and sample variance, respectively. Then the random variable \[T=\frac{\overline{X}-\mu}{S/\sqrt{n}}\sim t(n-1),\] follows a Student \(t-\)distribution with \(\nu=n-1\) degrees of freedom.

Using the same notation as with the chi-squared distribution, let \(t_\alpha(\nu)\) represent the critical \(t\)-value above which we find an area under the p.d.f. of \(t(\nu)\) equal to \(\alpha\,,\) i.e. \[P(T>t_\alpha(\nu))=\alpha\,,\] where \(T\sim t(\nu)\).

For all \(\nu\), the Student \(t\)-distribution is a symmetric distribution around zero, so we have \(t_{1-\alpha}(\nu)=-t_\alpha(\nu).\) The critical values can be found in tables, or by using the R function qt().

Student $t-$distribution with $r$ degrees of freedom.

Figure 3.22: Student \(t-\)distribution with \(r\) degrees of freedom [32].

If \(T\sim t(\nu)\), then for any \(0<\alpha< 1\), we have \[\begin{aligned} P&\left(|T|<t_{\alpha/2}(\nu)\right)=P\left(-t_{\alpha/2}(\nu)<T<t_{\alpha/2}(\nu)\right)\\&=P\left(T<t_{\alpha/2}(\nu)\right)-P\left(T<-t_{\alpha/2}(\nu)\right)\\ &=1-P\left(T>t_{\alpha/2}(\nu)\right)-(1-P\left(T>-t_{\alpha/2}(\nu)\right))\\ &=1-P\left(T>t_{\alpha/2}(\nu)\right)-(1-P\left(T>t_{1-\alpha/2}(\nu)\right))\\ &=1-\alpha/2-(1-(1-\alpha/2))=1-\alpha.\end{aligned}\] Consequently, \[P\left(-t_{\alpha/2}(n-1)<\frac{\bar X-\mu}{S/\sqrt{n}}<t_{\alpha/2}(n-1)\right)=1-\alpha\,.\] We can show that \(t(\nu)\to \mathcal{N}(0,1)\) as \(\nu\to\infty\); intuitively, this makes sense because the estimate \(S\) gets better at estimating \(\sigma\) when \(n\) increases.


Example: in R, we can see that when \(T\sim t(8)\),

qt(0.025, df=8, lower.tail=FALSE)
[1] 2.306004

so that \(P\left( T>2.306 \right)=0.025,\) which implies \[P\left(T< -2.306 \right) = 0.025\,\], so \(t_{0.025}(8)=2.306\) and \[\begin{aligned} P\left( |T|\leq 2.306 \right)&= P\left( -2.306\leq T\leq 2.306 \right)\\ &= 1 - P\left( T<- 2.306 \right) - P\left( T>2.306 \right) \\& =1-2P\left( T<- 2.306 \right)= 0.95\,.\end{aligned}\] The Student \(t-\)distribution will be useful when the time comes to compute confidence intervals and to do hypothesis testing (see Basics of Statistical Analysis).

3.5.3.1 \(F-\)Distributions

Let \(U\sim \chi^2(\nu_1)\) and \(V\sim \chi^2(\nu_2)\). If \(U\) and \(V\) are independent, then the random variable \[F=\frac{U/\nu_1}{V/\nu_2}\] follows an \(F\)-distribution with \(\nu_1\) and \(\nu_2\) degrees of freedom, which we denote by \(F\sim F(\nu_1,\nu_2)\).

The probability density function of \(F(\nu_1,\nu_2)\) is \[f(x)=\frac{\Gamma(\nu_1/2+\nu_2/2)(\nu_1/\nu_2)^{\nu_1/2}x^{\nu_1/2-1}}{\Gamma(\nu_1/2)\Gamma(\nu_2/2)(1+x\nu_1/\nu_2)^{\nu_1/2+\nu_2/2}},\quad x\geq 0.\]

Theorem: If \(S_1^2\) and \(S_2^2\) are the sample variances of independent random samples of size \(n\) and \(m\), respectively, taken from normal populations with variances \(\sigma_1^2\) and \(\sigma_2^2\,,\) then \[F=\frac{S_1^2/\sigma_1^2}{S_2^2/\sigma_2^2}\sim F(n-1,m-1)\] follows an \(F\)-distribution with \(\nu_1=n-1\), \(\nu_2=m-1\) d.f.

Notation: for \(0<\alpha<1\) and \(\nu_1,\nu_2\in \mathbb{N}^*\), \(f_\alpha(\nu_1,\nu_2)\) is the critical value for which \(P(F>f_\alpha(\nu_1,\nu_2))=\alpha\) where \(F\sim F(\nu_1,\nu_2)\). Critical values can be found in tables, or by using the R function \(\texttt{qf()}\).

It can be shown that \[f_{1-\alpha}(\nu_1,\nu_2)=\frac{1}{f_{\alpha}(\nu_2,\nu_1)};\] for instance, since

qf(0.95, df1=6, df2=10, lower.tail=FALSE)
[1] 0.2463077

then

\[f_{0.95}(6,10)=\frac1{f_{0.05}(10,6)}=\frac{1}{4.06}=0.246\,.\]

These distributions play a role in linear regression and ANOVA models (see ANOVA).

References

[32]
R. V. Hogg and E. A. Tanis, Probability and Statistical Inference, 7th ed. Pearson/Prentice Hall, 2006.