4.3 Point and Interval Estimation

One of the goals of statistical inference is to draw conclusions about a population based on a random sample from the population.

For instance, we might want answers to the following questions.

  1. Can we assess the reliability of a product’s manufacturing process by randomly selecting a sample of the final product and determining how many of them are compliant according to some quality assessment scheme?

  2. Can we determine who will win an election by polling a small sample of respondents?

Specifically, we seek to estimate an unknown parameter \(\theta\), say, using a single quantity called the point estimate \(\hat{\theta}\).

This point estimate is obtained via a statistic, which is simply a function of a random sample.

The probability distribution of the statistic is its sampling distribution; as an example, we have discussed the sampling distribution of the sample mean in the previous section. Describing such sampling distributions is a main focus of research.


Example: consider a process that manufactures gear wheels (in some standard gauge). Let \(X\) be the random variable that records the weight of a randomly selected gear wheel. What is the population mean \(\mu_X=\text{E}[X]\)?.

Answer: in the absence of the p.d.f. \(f(x)\), we can estimate \(\mu=X\) with the help of a random sample \(X_1,\ldots, X_n\) of gear wheel weight measurements, via the sample mean statistic: \[\overline{X}=\frac{X_1+\cdots+X_n}{n}\,,\] which approximately follows \(\mathcal{N}\left(\mu,\sigma^2/n\right)\), according to the central limit theorem.

Statistics

Common examples of statistics include:

  • the sample mean and the sample median;

  • the sample variance and the sample standard deviation;

  • sample quantiles (median, quartiles, quantiles);

  • test statistics (\(t-\)statistics, \(\chi^2-\)statistics, \(f-\)statistics, etc.);

  • order statistics (sample maximum and minimum, sample range, etc.);

  • sample moments and functions thereof (skewness, kurtosis, etc.);

  • etc.

4.3.1 Estimator (Sampling) Variance and Standard Error

In practice, the point estimator \(\hat{\theta}\) varies depending on the choice of the sample \(\{X_1,\ldots,X_n\}\).

The standard error of a statistic is the standard deviation of its sampling distribution.

For instance, if observations \(X_1,\ldots, X_n\) come from a a population with unknown mean \(\mu\) and known variance \(\sigma^2\), then \(\text{Var}(\overline{X})=\sigma^2/n\) and the standard error of \(\overline{X}\) is \[\sigma_{\overline{X}}=\frac{\sigma}{\sqrt{n}}.\]

If the variance of the original population is unknown, then it is estimated by the sample variance \(S^2\) and the estimated standard error of \(\overline{X}\) is \[\begin{aligned} \hat{\sigma}_{\overline{X}}=\frac{S}{\sqrt{n}}\, ,\quad\text{where}\quad S^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\overline{X})^2.\end{aligned}\]


Examples:

  1. A sample of 20 baseball player heights (in inches) is shown below.

    x=c(
      74,74,72,72,73,69,69,71,76,71,
      73,73,74,74,69,70,72,73,75,78
    )

    What is the standard error of the sample mean \(\overline{X}\)?

    Answer: the sampling mean of the heights is \[\overline{X}=\frac{X_1+\cdots+X_{20}}{20}=72.6\] and the sample variance \(S^2\) is \[S^2=\frac{1}{20-1}\sum_{i=1}^{20}(X_i-72.6)^2\approx 5.6211.\]

    The standard error of \(\overline{X}\) is thus \[\hat{\sigma}_{\overline{X}}=\frac{S}{\sqrt{20}} \approx \sqrt{\frac{5.6211}{20}}\approx 0.5301.\]

    The quantities can be computed directly via R:

    (x.bar = mean(x))
    (S2.x = var(x))
    (se.x = sqrt(S2.x/length(x)))
    [1] 72.6
    [1] 5.621053
    [1] 0.530144

    Note that var() always treats the underlying dataset as a sample, not as a population.

  2. Consider a sample \(\{X_1,\ldots,X_{100}\}\) of independent observations selected from a normal population \(\mathcal{N}(\mu,\sigma^2)\) where \(\sigma=50\) is known, but \(\mu\) is not. What is the best estimate of \(\mu\)? What is the sampling distribution of that estimate?

    Answer: the sample mean \(\overline{X}=\frac{X_1+\cdots+X_{100}}{100}\) provides the best estimate of \(\mu_X=\mu_{\overline{X}}\) and the standard error of \(\overline{X}\) is \(\sigma_{\overline{X}}=\frac{50}{\sqrt{100}}=5\).

    Since the observations are sampled independently from a normal population with mean \(\mu\) and standard deviation \(50\), \(\overline{X}\sim \mathcal{N}(\mu,5^2)=\mathcal{N}(\mu,25),\) according to the CLT.

4.3.2 Confidence Interval for \(\mu\) When \(\sigma\) is Known

Consider a sample \(\{x_1,\ldots,x_n\}\) drawn from a normal population with known variance \(\sigma^2\) and unknown mean \(\mu\). The sample mean \[\overline{x}=\frac{x_1+\cdots+x_n}{n}\] is a point estimate of \(\mu\).41

Of course, this estimate is not exact, because \(\overline{x}\) is an observed value of \(\overline{X}\); it is unlikely that the observed value \(\overline{x}\) should coincide with \(\mu\).

We know that \(\overline{X}\sim \mathcal{N}(\mu,\sigma^2/n)\), so that \[Z=\frac{\overline{X}-\mu}{\sigma/\sqrt{n}}\sim\mathcal{N}(0,1).\]

The \(68-96-99.7\) Rule

For the standard normal distribution, it can be shown that \[P(|Z|<1)\approx 0.683,\quad P(|Z|<2)\approx 0.955,\quad P(|Z|<3)\approx 0.997.\] This says that about 68% of the observations from \(\mathcal{N}(0,1)\) fall within one standard deviation (\(\sigma=1\)) from the mean \((\mu=0)\), about 96% within two standard deviations, and about 99.7% within three.

The 68-96-99.7 rule on the standard normal distribution. [source unknown]

Figure 4.7: The 68-96-99.7 rule on the standard normal distribution. [source unknown]

In other words, whenever we observe a sample mean \(\overline{X}\) (with sample size \(n\)) from a normal population with mean \(\mu\), we would expect the inequality \[-k<Z=\frac{\overline{X}-\mu}{\sigma/\sqrt{n}}<k\] to hold approximately \[g(k)=\begin{cases} 68.3\% \text{ of the time,} & \text{if $k=1$}\\ 95.5\% \text{ of the time,} & \text{if $k=2$}\\ 99.7\% \text{ of the time,} & \text{if $k=3$} \end{cases}\]

Confidence Intervals

By re-arranging the terms, we can build a symmetric \(g(k)\) confidence interval (C.I.) for \(\mu\): \[\overline{X}-k\frac{\sigma}{\sqrt{n}}<\mu<\overline{X}+k\frac{\sigma}{\sqrt{n}} ~~\Longrightarrow~~ \text{C.I.}(\mu;g(k))\equiv\overline{X}\pm k\frac{\sigma}{\sqrt{n}}.\]


Examples:

  1. Consider a sample \(\{X_1,\ldots, X_{64}\}\) from a normal population with known standard deviation \(\sigma=72\) and unknown mean \(\mu\). The sample mean is \(\overline{X}=375.2\). Build a symmetric \(68.3\)% confidence interval for \(\mu\).

    Answer: according to the formula, the symmetric \(68.3\)% confidence interval (\(k=1\)) for \(\mu\) would be \[\text{C.I.}(\mu;0.683)\equiv \overline{X}\pm k\frac{\sigma}{\sqrt{n}}\equiv 375.2\pm 1\cdot\frac{72}{\sqrt{64}},\] which is to say \[\text{C.I.}(\mu;0.683)\equiv (375.2-9,375.2+9)=(366.2,384.2).\]

    VERY IMPORTANT: this does not say that we are \(68.3\)% sure that the true \(\mu\) is between \(366.2\) and \(384.2\). Rather, what it says is that when a sample of size \(64\) is taken from a normal population \(\mathcal{N}(\mu,72^2)\) and a symmetric \(68.3\)% confidence interval for \(\mu\) is built, \(\mu\) will fall between the endpoints of the interval about \(68.3\)% of the time.42

  2. Build a symmetric \(95.5\)% confidence interval for \(\mu\).

    Answer: the same formula applies, with \(k=2\): \[\text{C.I.}(\mu;0.955)\equiv \overline{X}\pm k\frac{\sigma}{\sqrt{n}}\equiv 375.2\pm 2\cdot\frac{72}{\sqrt{64}},\] which is to say \[\begin{aligned} \text{C.I.}(\mu;0.995)&\equiv (375.2-18,375.2+18)\\&=(357.2,393.2).\end{aligned}\]

  3. Build a symmetric \(99.7\)% confidence interval for \(\mu\).

    Answer: again, the same formula applies, with \(k=3\): \[\text{C.I.}(\mu;0.997)\equiv \overline{X}\pm k\frac{\sigma}{\sqrt{n}}\equiv 375.2\pm 3\cdot\frac{72}{\sqrt{64}},\] which is to say \[\begin{aligned} \text{C.I.}(\mu;0.995)&\equiv (375.2-27,375.2+27)\\&=(348.2,402.2).\end{aligned}\]


Note that the C.I. increases in size with the confidence level. The interpretation stays the same, no matter the required confidence level or the parameter of interest.

A \(95\)% C.I. for the mean, for instance, indicates that we would expect \(19\) out of \(20\) samples from the same population to produce confidence intervals that contain the true population mean, on average.

Frequentist interpretation of confidence intervals: out  of 20 experiments, we would expect the true population mean to fall in the confidence interval about 19 times, on average. [source unknown]

Figure 4.8: Frequentist interpretation of confidence intervals: out of 20 experiments, we would expect the true population mean to fall in the confidence interval about 19 times, on average. [source unknown]

Confidence Interval for \(\mu\) when \(\sigma\) is Known (Reprise)

Another approach to C.I. building is to specify the proportion of the area under \(\phi(z)\) of interest, and then to determine the critical values (which is to say, the endpoints of the interval).

Let \(\{X_1,\ldots,X_n\}\) be drawn from \(\mathcal{N}(\mu,\sigma^2)\). Recall that \[{\frac{\overline{X}-\mu}{\sigma/\sqrt{n}}\sim \mathcal{N}(0,1)}.\] For a symmetric \(95\)% C.I. for \(\mu\), we need to find \(z^*>0\) such that \(P(-z^*<Z<z^*)\approx 0.95\). But the left-hand side of this “equality” can be re-written as \[\begin{aligned} P(-z^*<Z<z^*)&=\Phi(z^*)-\Phi(-z^*)\\&=\Phi(z^*)-(1-\Phi(z^*))\\&=2\Phi(z^*)-1;\end{aligned}\]

we are thus looking for a critical value \(z^*\) such that \[0.95=2\Phi(z^*)-1 \Longrightarrow \Phi(z^*)=\frac{0.95+1}{2}=0.975.\] From any normal table (or via qnorm(0.975) in R), we see that \(\Phi(1.96)\approx 0.9750\), so that \[P(-1.96<Z<1.96)=P\left(-1.96<\frac{\overline{X}-\mu}{\sigma/\sqrt{n}}<1.96\right)\approx 0.95.\]

In other words, the inequality \[-1.96< \frac{\overline{X}-\mu}{\sigma/\sqrt{n}} < 1.96\] holds with probability \(0.95\), or, equivalently, \[\text{C.I.}(\mu;0.95)\equiv \overline{X}\pm 1.96 \frac{\sigma}{\sqrt{n}}\] is the (symmetric) \(95\)% C.I. for \(\mu\) when \(\sigma\) is known.

A similar argument shows that \[\text{C.I.}(\mu;0.99)\equiv \overline{X}\pm 2.575 \frac{\sigma}{\sqrt{n}}\] is the (symmetric) \(99\)% C.I. for \(\mu\) when \(\sigma\) is known.


Example:

  1. A sample of size \(n=17\) is selected from a normal population with mean \(\mu=−3\) (this is information is unknown to the analysts: this is what they are trying to determine) and standard deviation \(\sigma=2\), which is known. The data is shown below:

     [1] -0.4740914 -3.6524667 -0.3404015 -0.4551414 -2.1707171 -6.0799001
     [7] -4.8571341 -3.5894409 -3.0115343  1.8093068 -1.4728131 -4.5980185
    [13] -5.2953140 -3.5789231 -3.5984302 -3.8230217 -2.4955531

    Build a 95% confidence interval for \(\mu\).

    Answer: the sample mean \(\overline{x}\) is given by

    mean(x)
    [1] -2.804917

    and the corresponding 95% confidence interval is:

    lower.bound = mean(x) - 1.96*2/sqrt(17) 
    upper.bound = mean(x) + 1.96*2/sqrt(17)
    c(lower.bound,upper.bound)
    [1] -3.755657 -1.854178

    We notice that \(\mu=3\) is indeed found in the confidence interval:

    lower.bound<mu & mu<upper.bound
    [1] TRUE
  2. Repeat the process \(M=1000\) times. How often does \(\mu\) fall in the confidence interval?

    Answer: we set the seed and the problem parameters.

    set.seed(0)  # for replicability
    n=17
    mu=-3
    sigma=2
    M=1000

    Next, we initialize the vector which determines if \(\mu\) is in the C.I.:

    is.mu.in <- c() 

    and the vector which will contain the sample mean for each of the \(M=1000\) repetitions of the experiment:

    sample.means <- c() 

    Finally, we set-up the repetitions: for each sample, we compute the sample mean and the confidence interval bounds, and determine if the true (unknown) value \(\mu=2\) falls in the confidence interval or not.

    for(j in 1:M){
      x=rnorm(n,mu,sigma)
      sample.means[j] = mean(x)
      lower.bound = sample.means[j] - 1.96*sigma/sqrt(n)
      upper.bound = sample.means[j] + 1.96*sigma/sqrt(n)
      is.mu.in[j] = lower.bound<mu & mu<upper.bound
    }

    The proportion of the times when it does can thus be obtained via

    table(is.mu.in)/M
    is.mu.in
    FALSE  TRUE 
    0.055 0.945 

    This is indeed very close to 95%. We can also verify the conclusion of the CLT: look at the histogram of the sample means!

    hist(sample.means, xlim=c(-8,8))

    This differs markedly from the histogram of the sample values: for instance, the last of the \(M=1000\) samples is distributed as below:

    hist(x, xlim=c(-8,8))

    The sample variance is significantly larger than the standard error.

4.3.3 Confidence Level

The confidence level \(1-\alpha\) is usually expressed in terms of a small \(\alpha\), so that \(\alpha=0.05\) corresponds to a confidence level of \(1-\alpha=0.95\).

For \(\alpha\in (0,1)\), the value \(z_{\alpha}\) for which \(P(Z>z_{\alpha})=\alpha\) is called the \(100(1-\alpha)\)% quantiles of the standard normal distribution. The situation is illustrated in Figure 4.9

Quantiles of the standard normal distribution. [@DSU_TH]

Figure 4.9: Quantiles of the standard normal distribution. [32]

For general \(2-\)sided confidence intervals (the ones we have been building so far), the appropriate numbers are found by solving \(P(|Z|>z^*)=\alpha\) for \(z^*\). By the properties of \(\mathcal{N}(0,1)\), \[\begin{aligned} \alpha=P(|Z|>z^*)&=1-P(-z^*<Z<z^*)\\&=1-(2\Phi(z^*)-1)\\&=2(1-\Phi(z^*)),\end{aligned}\] so that \[\Phi(z^*)=1-\alpha/2\implies z^*=z_{\alpha/2},\] as illustrated in Figure 4.10.

Two-sided quantiles of the standard normal distribution. [@DSU_TH]

Figure 4.10: Two-sided quantiles of the standard normal distribution. [32]

The most commonly-used cases are for \(\alpha=0.05\) and \(\alpha=0.01\): \[\begin{aligned} P(|Z|>z_{0.025})&=0.05 \implies z_{0.025}=1.96 \\ P(|Z|>z_{0.005})&=0.01 \implies z_{0.005}=2.575. \end{aligned}\]

Two-sided quantiles of the standard normal distribution, for confidence level 0.05. [source  unknown]

Figure 4.11: Two-sided quantiles of the standard normal distribution, for confidence level 0.05. [source unknown]

The symmetric \(100(1-\alpha)\)% C.I. for \(\mu\) can thus generally be written as \[\text{C.I.}(\mu;1-\alpha)\overline{X}\pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}.\] For a given confidence level \(\alpha\), shorter confidence intervals are better in relation to estimating the mean:

  • estimates improve when the sample size \(n\) increases;

  • estimates improve when \(\sigma\) decreases.

For a given sample, if \(\alpha_1>\alpha_2\) then \[100(1-\alpha_1)\%\ \text{C.I.}\subseteq 100(1-\alpha_2)\%\ \text{C.I.}\] For instance, the \(95\)% C.I. built from a sample is always contained in the corresponding \(99\)% C.I.

If the sample comes from a normal population, then the C.I.is exact. Otherwise, if \(n\) is large, we may use the CLT and get an approximate C.I.


Examples

  • A sample of \(9\) observations from a normal population with known standard deviation \(\sigma=5\) yields a sample mean \(\overline{X}=19.93\). Provide a \(95\)% and a \(99\)% C.I. for the unknown population mean \(\mu\).

    Answer: the point estimate of \(\mu\) is the sample mean \(\overline{X}=19.93\). The \(100(1-\alpha)\)% C.I.s are \[\overline{X}\pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}}.\] Thus, \[\begin{aligned} \text{C.I.}(\mu;0.95)&\equiv 19.93 \pm 1.96\frac{5}{\sqrt{9}}=(16.66,23.20) \\ \text{C.I.}(\mu;0.99)&\equiv 19.93 \pm 2.575\frac{5}{\sqrt{9}}=(15.64,24.22).\end{aligned}\]

  • A sample of \(25\) observations from a normal population with known standard deviation \(\sigma=5\) yields a sample mean \(\overline{X}=19.93\). Provide a \(95\)% and a \(99\)% C.I. for the unknown population mean \(\mu\).

    Answer: the point estimate of \(\mu\) is the sample mean \(\overline{X}=19.93\). The \(100(1-\alpha)\)% C.I.s are \[\begin{aligned} \text{C.I.}(\mu;0.95)&\equiv 19.93 \pm 1.96\frac{5}{\sqrt{25}}=(17.97,21.89) \\ \text{C.I.}(\mu;0.99)&\equiv 19.93 \pm 2.575\frac{5}{\sqrt{25}}=(17.35,22.51).\end{aligned}\]

  • A sample of \(25\) observations from a normal population with known standard deviation \(\sigma=10\) yields a sample mean \(\overline{X}=19.93\). Provide a \(95\)% and a \(99\)% C.I. for the unknown population mean \(\mu\).

    Answer: the point estimate of \(\mu\) is the sample mean \(\overline{X}=19.93\). The \(100(1-\alpha)\)% C.I.s are \[\begin{aligned} \text{C.I.}(\mu;0.95)&\equiv 19.93 \pm 1.96\frac{10}{\sqrt{25}}=(16.01,23.85) \\ \text{C.I.}(\mu;0.99)&\equiv 19.93 \pm 2.575\frac{10}{\sqrt{25}}=(14.78,25.08).\end{aligned}\]

Note how the confidence intervals are affected by \(\alpha\), \(n\), and \(\sigma\).

4.3.4 Sample Size

The error \(E\) we commit by estimating \(\mu\) via the sample mean \(\overline{X}\) is smaller than \(z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\), with probability \(100(1-\alpha)\)% (in the frequentist interpretation).

Estimation error.

Figure 4.12: Estimation error.

At this stage, if we want to control the error \(E\), the only thing we can really do is control the sample size:43 \[E> z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\implies n> \left(\frac{z_{\alpha/2}\sigma}{E}\right)^2.\]


Examples:

  1. A sample \(\{X_1,\ldots,X_n\}\) is selected from a normal population with standard deviation \(\sigma=100\). What sample size should be used to insure that the error on the population estimate is at most \(E=10\), at a confidence level \(\alpha=0.05\)?

    Answer: as long as \[n>\left(\frac{z_{\alpha/2}\sigma}{E}\right)^2 = \left(\frac{z_{0.025}\cdot 100}{10}\right)^2=(19.6)^2=384.16,\] then the error committed by using \(\overline{X}\) to estimate \(\mu\) will be at most \(10\), with \(95\)% probability.

  2. Repeat the first example, but with \(\sigma=10\).

    Answer: we need \[n> \left(\frac{z_{\alpha/2}\sigma}{E}\right)^2 = \left(\frac{z_{0.025}\cdot 10}{10}\right)^2=(1.96)^2=3.8416.\]

  3. Repeat the first example, but with \(E=1\).

    Answer: we need \[n> \left(\frac{z_{\alpha/2}\sigma}{E}\right)^2 = \left(\frac{z_{0.025}\cdot 100}{1}\right)^2=(196)^2=38416.\]

  4. Repeat the first example, but with \(\alpha=0.01\).

    Answer: we need \[n> \left(\frac{z_{\alpha/2}\sigma}{E}\right)^2 = \left(\frac{z_{0.005}\cdot 100}{10}\right)^2=(25.75)^2=663.0625.\]

The relationship between \(\alpha\), \(\sigma\), \(E\), and \(n\) is not always intuitive, but it follows a simple rule.

4.3.5 Confidence Interval for \(\mu\) When \(\sigma\) is Unknown

So far, we have been in the fortunate situation of sampling from a population with known variance \(\sigma^2\). What do we do when the population variance is unknown (a situation which occurs much more frequently in real world applications)?

The solution is to estimate \(\sigma\) using the sample variance \[S^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\overline{X})^2\] and the sample standard deviation \(S=\sqrt{S^2}\); we use \(\overline{X}\) instead of \(\mu\) since we do not know the value of the latter (that is indeed the parameter whose value we are trying to estimate in the first place).44

If \(\sigma\) is unknown, it can be shown that \(\frac{\overline{X}-\mu}{S/\sqrt{n}}\) follows approximately the Student \(t-\)distribution with \(n-1\) degrees of freedom, \(t(n-1)\).

Consequently, at a confidence level \(\alpha\), we have \[P\left(-t_{\alpha/2}(n-1)<\frac{\overline{X}-\mu}{S/\sqrt{n}}<t_{\alpha/2}(n-1)\right)\approx 1-\alpha,\] where \(t_{\alpha/2}(n-1)\) is the \(100(1-\alpha/2)^{\text{th}}\) quantile of \(t(n-1)\). These can be read from pre-compiled tables or computed using the R function qt().

Thus, \[100(1-\alpha)\% \text{ C.I. for }\mu\approx \overline{X}\pm t_{\alpha/2}(n-1)\frac{S}{\sqrt{n}}.\]

Equality is reached if the underlying population is normal. For instance, if \(\alpha=0.05\) and \(\{X_1,X_2,X_3,X_4,X_5\}\) are samples from a normal distribution with unknown mean \(\mu\) and unknown variance \(\sigma^2\), then \(t_{0.025}(5-1)=2.776\) and \[P\left(-2.776<\frac{\overline{X}-\mu}{S/\sqrt{5}} <2.776\right) =0.95.\]

Critical value for Student distribution with 4 degrees of freedom, at confidence level 0.05. [source unknown]

Figure 4.13: Critical value for Student distribution with 4 degrees of freedom, at confidence level 0.05. [source unknown]


Examples:

  1. For a given year, \(9\) measurements of ozone concentration are obtained: \[3.5, 5.1, 6.6, 6.0, 4.2, 4.4, 5.3, 5.6, 4.4.\] Assuming that the measured ozone concentrations follow a normal distribution with variance \(\sigma^2=1.21\), build a \(95\)% C.I. for the population mean \(\mu\). Note that \(\overline{X}=5.01\) and that \(S=0.97\).

    Answer: since the variance is known, we use the standard normal quantile \(z_{\alpha/2}=z_{0.025}=1.96:\) \[\overline{X}\pm z_{0.025}\frac{\sigma}{\sqrt{n}} = 5.01\pm 1.96\frac{\sqrt{1.21}}{\sqrt{9}}=(4.29,5.73).\]

  2. Do the same thing, this time assuming that the true variance of the underlying population is unknown.

    Answer: since variance is unknown, we use the Student quantile \(t_{\alpha/2}(n-1)=t_{0.025}(8)=2.306\): \[\overline{X}\pm t_{0.025}(n-1)\frac{S}{\sqrt{n}} = 5.01\pm 2.306\frac{0.97}{\sqrt{9}}=(4.26,5.76).\]

    The quantile value can be obtained from R using qt():

    alpha=0.05
    n=9
    qt(1-alpha/2,n-1)
    [1] 2.306004
  3. A sample of size \(n=17\) is selected from a normal population with mean \(\mu=−3\) (this is information is unknown to the analysts: this is what they are trying to determine) and unknown standard deviation. The data is shown below:

     [1] -0.4740914 -3.6524667 -0.3404015 -0.4551414 -2.1707171 -6.0799001
     [7] -4.8571341 -3.5894409 -3.0115343  1.8093068 -1.4728131 -4.5980185
    [13] -5.2953140 -3.5789231 -3.5984302 -3.8230217 -2.4955531

    Build a 95% confidence interval for \(\mu\).

    Answer: the sample mean \(\overline{x}\) is given by

    mean(x)
    [1] -2.804917

    and the corresponding 95% confidence interval is:

    lower.bound = mean(x) - qt(1-0.05/2,17-1)*2/sqrt(17) 
    upper.bound = mean(x) + qt(1-0.05/2,17-1)*2/sqrt(17)
    c(lower.bound,upper.bound)
    [1] -3.833222 -1.776612

    We notice that \(\mu=3\) is indeed found in the confidence interval:

    lower.bound<mu & mu<upper.bound
    [1] TRUE

When the underlying variance is known, the C.I. is tighter (smaller), which is only natural as we are more confident about our results when we have more information.

Note: what we have seen is that when the underlying distribution is normal, or when it is not normal but the sample size is “large” enough, we can build a C.I. for the population mean, whether the population variance is known or not.

If, however, the underlying population is not normal and the sample size is “small”, the approach used in this section cannot guarantee the C.I.’s accuracy.

4.3.6 Confidence Interval for a Proportion

If \(X\) is the number of successes in \(n\) independent trials, then \(X\sim \mathcal{B}(n,p)\), \(\text{E}[X]=np\) and \(\text{Var}[X]=np(1-p)\), and the point estimator for \(p\) is simply \(\hat{P}=\frac{X}{n}\).

Since \(X\) is a sum of iid random variables, its standardization \[Z=\frac{X-\mu}{\sigma}=\frac{n\hat{P}-np}{\sqrt{np(1-p)}}=\frac{\hat{P}-p}{\sqrt{\frac{p(1-p)}{n}}}\] is approximately \(\mathcal{N}(0,1)\), when \(n\) is large enough.

Thus, for sufficiently large \(n\), \[P\left(-z_{\alpha/2}<\frac{\hat{P}-p}{\sqrt{\frac{p(1-p)}{n}}}<z_{\alpha/2}\right)\approx 1-\alpha.\]

Using the construction presented earlier in this section, we conclude that \[\hat{P}-z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}}<p<\hat{P}+z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}}\] is an approximate \(100(1-\alpha)\)% C.I. for \(p\). However, this result is not useful in practice because \(p\) is unknown, so we use the following approximation instead: \[\hat{P}-z_{\alpha/2}\sqrt{\frac{\hat{P}(1-\hat{P})}{n}}<p<\hat{P}+z_{\alpha/2}\sqrt{\frac{\hat{P}(1-\hat{P})}{n}}.\]


Examples:

  1. Two candidates (\(A\) and \(B\)) are running for office. A poll is conducted: \(1000\) voters are selected randomly and asked for their preference: \(52\)% support \(A\), while \(48\)% support their rival, \(B\). Provide a \(95\)% C.I. for the support of each candidate.

    Answer: we use \(\alpha=0.05\) and \(\hat{P}=0.52\). The approximate \(95\)% C.I. for \(A\) is thus \[0.52\pm 1.96\sqrt{\frac{0.52\cdot 0.48}{1000}} \approx 0.52\pm 0.031,\] while the one for \(B\) is \(0.48 \pm 0.031\).

  2. On the strength of this polling result, a newspaper prints the following headline: “Candidate \(A\) Leads Candidate \(B\)!” Is the headline warranted?

    Answer: although there is a \(4-\)point gap in the poll numbers, the true support for candidate \(A\) is in the \(48.9\%-55.1\)% range, and, the true support for candidate \(B\) is in the \(44.9\%-51.1\)% range, with probability \(95\)% (that is to say, \(19\) times out of \(20\)).

    Since there is overlap in the confidence intervals, the race is more likely to be a dead heat.

References

[32]
R. V. Hogg and E. A. Tanis, Probability and Statistical Inference, 7th ed. Pearson/Prentice Hall, 2006.