3.3 Continuous Distributions

How do we approach probabilities where there there are uncountably infinitely many possible outcomes, such as one might encounter if $$X$$ represents the height of an individual in the population, for instance (e.g., the outcomes reside in a continuous interval)? What is the probability that a randomly selected person is about $$6$$ feet tall, say?

3.3.1 Continuous Random Variables

In the discrete case, the probability mass function $f_X(x)=P(X=x)$ was the main object of interest. In the continuous case, the analogous role is played by the probability density function (p.d.f.), still denoted by $$f_X(x)$$, but there is a major difference with discrete r.v.: $f_X(x) \neq P(X=x).$ The (cumulative) distribution function (c.d.f.) of any such random variable $$X$$ is also still defined by $F_X(x)=P(X\leq x)\,,$ viewed as a function of a real variable $$x$$; however $$P(X\leq x)$$ is not simply computed by adding a few terms of the form $$P(X=x_i)$$.

Note as well that $\lim_{x\to -\infty}F_X(x)=0\quad\mbox{and}\quad\lim_{x\to +\infty}F_X(x)=1.$ We can describe the distribution of the random variable $$X$$ via the following relationship between $$f_X(x)$$ and $$F_X(x)$$: $f_X(x)=\frac{d}{dx}F_X(x);$ in the continuous case, probability theory is simply an application of calculus!

Area Under the Curve

For any $$a<b$$, we have $\left\{ X\leq b \right\} = \left\{ X\leq a \right\}\cup \left\{ a<X\leq b \right\},$ so that \begin{aligned} P\left( X\leq a \right)+P\left( a<X\leq b \right) &= P\left( X\leq b \right)\end{aligned} and thus \begin{aligned} P\left( a<X\leq b \right)&= P\left( X\leq b \right)- P\left( X\leq a \right)\\ &= F_X(b)-F_X(a)=\int_a^b f_X(x)\, dx\end{aligned}

Probability Density Function

The probability density function (p.d.f.) of a continuous random variable $$X$$ is an integrable function $$f_X: X(\mathcal{S})\to \mathbb{R}$$ such that:

• $$f_X(x)>0$$ for all $$x\in X(\mathcal{S})$$ and $$\displaystyle{\lim_{x\to \pm \infty}f_X(x)=0}$$;

• $$\int_{\mathcal{S}}f_X(x)\, dx=1$$;

• for any event $$A=(a,b)=\{X|a<X<b\}$$, $P(A)=P((a,b))=\int_a^b f_X(x)\, dx,$

and the cumulative distribution function (c.d.f.) $$F_X$$ is given by $F_X(x)=P(X\leq x)=\int_{-\infty}^xf_X(t)\, dt.$ Unlike discrete distributions, the absence or presence of endpoints does not affect the probability computations for continuous distributions: for any $$a,b$$, $P(a<X<b)=P(a\leq X<b)=P(a<X\leq b)=P(a\leq X\leq b),$ all taking the value $F_X(b)-F_X(a)=\int_a^bf(x)\, dx.$ Furthermore, for any $$x$$, $P(x> X) = 1-P(X\leq x)=1-F_X(x)=1-\int_{-\infty}^xf_X(t)\, dt;$ and for any $$a$$, $P\left( X=a \right)= P\left( a\leq X\leq a \right)= \int_a^{a} f_X(x)\,dx=0.$ That last result explains why it is pointless to speak of the probability of a random variable taking on a specific value in the continuous case; rather, we are interested in ranges of values.

Examples

• Assume that $$X$$ has the following p.d.f.: $f_X(x)=\begin{cases}0 & \text{if x<0} \\ x/2 & \text{if 0\leq x\leq 2} \\ 0 & \text{if x>2}\end{cases}$ Note that $$\int_{0}^2f(x)\, dx =1.$$ The corresponding c.d.f.is given by: \begin{aligned} F_X&(x)=P(X\leq x)=\int_{-\infty}^x f_X(t)\, dt \\ &=\begin{cases} 0 & \text{if x<0} \\ 1/2\cdot \int_{0}^x t\, dt = x^2/4 & \text{if 0<x<2} \\ 1 & \text{if x\geq 2} \end{cases}\end{aligned}

The p.d.f. and the c.d.f. for this r.v. are shown in Figure 3.8.

• What is the probability of the event $A=\{X|0.5<X<1.5\}?$

Answer: we need to evaluate \begin{aligned} P(A)&=P(0.5<X<1.5)=F_X(1.5)-F_X(0.5)\\ &=\frac{(1.5)^2}{4}-\frac{(0.5)^2}{4} =\frac{1}{2}. \end{aligned}

• What is the probability of the event $$B=\{X|X=1\}$$?

Answer: we need to evaluate $P(B) = P(X=1)=P(1\leq X\leq 1)=F_{X}(1)-F_X(1)=0.$ This is not unexpected: even though $$f_X(1)=0.5\neq 0$$, $$P(X=1)=0$$, as we saw earlier.

• Assume that, for $$\lambda>0$$, $$X$$ has the following p.d.f.: $f_X(x)=\begin{cases} \lambda\exp(-\lambda x) & \text{if x\geq 0}\\ 0&\text{if x<0} \end{cases}$ Verify that $$f_X$$ is a p.d.f.for all $$\lambda>0$$, and compute the probability that $$X>10.2$$.

Answer: that $$f_X$$ is a p.d.f.is obvious; the only work goes into showing that \begin{aligned} \int_{-\infty}^{\infty}&f(x)\, dx =\int_{0}^{\infty}\lambda\exp(-\lambda x)\, dx\\&=\lim_{b\to\infty}\int_{0}^{b}\lambda\exp(-\lambda x)\, dx\\&=\lim_{b\to\infty}\lambda \left[\frac{\exp(-\lambda x)}{-\lambda}\right]_0^b =\lim_{b\to\infty}\left[-\exp(-\lambda x)\right]_0^b\\ &=\lim_{b\to\infty}\left[-\exp(-\lambda b)+\exp(0)\right]=1.\end{aligned} The corresponding c.d.f.is given by: \begin{aligned} F_X(x;\lambda)&=P_{\lambda}(X\leq x)=\int_{-\infty}^{x}f_X(t)\, dt\\&=\begin{cases} 0 & \text{if x<0} \\ \lambda\int_0^x\exp(-\lambda t)\, dt & \text{if x\geq 0}\end{cases} \\ & = \begin{cases} 0 & \text{if x<0} \\ [-\exp(-\lambda t)]_0^x & \text{if x\geq 0} \end{cases} \\ &= \begin{cases} 0 & \text{if x<0} \\ 1-\exp(-\lambda x) & \text{if x\geq 0} \end{cases}\end{aligned} Then \begin{aligned} P_{\lambda}(X>10.2)&=1-F_X(10.2;\lambda)=1-[1-\exp(-10.2\lambda)]\\&=\exp(-10.2\lambda)\end{aligned} is a function of the distribution parameter $$\lambda$$ itself:

$$\lambda$$ $$P_{\lambda}(X>10.2)$$
$$0.002$$ $$0.9798$$
$$0.02$$ $$0.8155$$
$$0.2$$ $$0.1300$$
$$2$$ $$1.38 \times 10^{-9}$$
$$20$$ $$2.54 \times 10^{-89}$$
$$200$$ $$0$$ (for all intents and purposes)

For $$\lambda=0.2$$, for instance, the p.d.f.and c.d.f.are:

The probability that $$X>10.2$$ is the area (to $$\infty$$) in blue, below.

For $$\lambda=2$$, the probability is so small ($$1.38\times 10^{-9}$$) that it does not even appear in the p.d.f. (see below).

Note that in all cases, the shape of the p.d.f.and the c.d.fare the same (the spike when $$\lambda=2$$ is much higher than that when $$\lambda=0.2$$ – why must that be the case?). This is not a general property of distributions, however, but a property of this specific family of distributions.

3.3.2 Expectation of a Continuous Random Variable

For a continuous random variable $$X$$ with p.d.f. $$f_X(x)$$, the expectation of $$X$$ is defined as $\text{E}[X]=\int_{-\infty}^\infty x f_X(x)\,dx\,.$ For any function $$h(X)$$, we can also define $\text{E}\left[ h(X) \right] = \int_{-\infty}^\infty h(x) f_X(x)\,dx\,.$

Examples:

• Find $$\text{E}[X]$$ and $$\text{E}[X^2]$$ in the first example, above.

Answer: we need to evaluate \begin{aligned} \text{E} [X]&=\int_{-\infty}^{\infty}xf_X(X)\, dx=\int_0^2xf_X(x)\,dx \\ &=\int_0^2\frac{x^2}{2}\, dx = \left[\frac{x^3}{6}\right]_{x=0}^{x=2}=\frac{4}{3};\\ \text{E}[X^2]&=\int_0^2\frac{x^3}{2}\, dx=2.\end{aligned}

• Note that the expectation need not exist! Compute the expectation of the random variable $$X$$ with p.d.f. $f_X(x)=\frac{1}{\pi(1+x^2)}, \quad-\infty<x<\infty.$

Answer: let’s verify that $$f_X(x)$$ is indeed a p.d.f.: \begin{aligned} \int_{-\infty}^{\infty}f_X(x)\, dx&= \frac{1}{\pi}\int_{-\infty}^{\infty}\frac{1}{1+x^2}\, dx \\&= \frac{1}{\pi}[\arctan(x)]^{\infty}_{-\infty}=\frac{1}{\pi}\left[\frac{\pi}{2}+\frac{\pi}{2}\right]=1. \end{aligned}

We can also easily see that \begin{aligned} F_X(x)&=P(X\leq x)=\int_{-\infty}^xf_X(t)\, dt\\& =\frac{1}{\pi}\int_{-\infty}^x\frac{1}{1+t^2}\, dt=\frac{1}{\pi}\arctan(x)+\frac{1}{2}.\end{aligned} For instance, $$P(X\leq 3)=\frac{1}{\pi}\arctan(3)+\frac{1}{2}$$, say.

The expectation of $$X$$ is \begin{aligned} \text{E}[X]&=\int_{-\infty}^{\infty}xf_X(x)\, dx = \int_{-\infty}^{\infty}\frac{x}{\pi(1+x^2)}\, dx.\end{aligned} If this improper integral exists, then it needs to be equal both to $\underbrace{\int_{-\infty}^0\frac{x}{\pi(1+x^2)}\, dx + \int_0^{\infty}\frac{x}{\pi(1+x^2)}\, dx}_{\text{candidate 1}}$ and to the Cauchy principal value $\underbrace{\lim_{a\to\infty}\int_{-a}^a\frac{x}{\pi(1+x^2)}\, dx}_{\text{candidate 2}}.$ But it is straightforward to find an antiderivative of $$\frac{x}{\pi(1+x^2)}.$$ Set $$u=1+x^2$$. Then $$du=2xdx$$ and $$xdx=\frac{du}{2}$$, and we obtain $\int \frac{x}{\pi(1+x^2)}\, dx=\frac{1}{2\pi}\int u\, du=\frac{1}{2\pi}\ln|u|=\frac{1}{2\pi}\ln(1+x^2).$ Then the candidate $$2$$ integral reduces to \begin{aligned} \lim_{a\to\infty}\left[\frac{\ln(1+x^2)}{2\pi}\right]_{-a}^a&=\lim_{a\to\infty}\left[\frac{\ln(1+a^2)}{2\pi}-\frac{\ln(1+(-a)^2)}{2\pi}\right] \\ &=\lim_{a\to\infty}0=0;\end{aligned} while the candidate $$1$$ integral reduces to $\left[\frac{\ln(1+x^2)}{2\pi}\right]^0_{-\infty}+\left[\frac{\ln(1+x^2)}{2\pi}\right]^{\infty}_0 =0-(\infty)+\infty-0=\infty-\infty$ which is undefined. Thus $$\text{E}[X]$$ cannot not exist, as it would have to be both equal to $$0$$ and be undefined simultaneously.

Mean and Variance

In a similar way to the discrete case, the mean of $$X$$ is defined to be $$\text{E}[X]$$, and the $$\textbf{variance}$$ and standard deviation of $$X$$ are, as before, \begin{aligned} \text{Var}[X]&\stackrel{\text{def}}= \text{E}\left[( X-\text{E}[X])^2 \right] =\text{E}[X^2]- \text{E}^2[X]\,, \\ \text{SD}[X]&=\sqrt{\text{Var}[X]}\,.\end{aligned} As in the discrete case, if $$X,Y$$ are continuous random variables, and $$a,b\in\mathbb{R}$$, then \begin{aligned} \text{E}[aY+bX]&= a\text{E}[Y]+b\text{E}[X]\\ \text{Var}[ a+bX ]&= b^2\text{Var}[X]\\ \text{SD}[ a+bX ]&=|b|\text{SD}[X]\end{aligned} The interpretations of the mean as a measure of centrality and of the variance as a measure of dispersion are unchanged in the continuous case.

For the time being, however, we cannot easily compute the variance of a sum $$X+Y$$, unless $$X$$ and $$Y$$ are independent random variables, in which case $\text{Var}[X+Y]= \text{Var}[X]+\text{Var}[Y].$

3.3.3 Normal Distributions

A very important example of a continuous distribution is that provided by the special probability distribution function $\phi(z)=\frac1{\sqrt{2\pi}}e^{-z^2/2}\,.$ The corresponding cumulative distribution function is denoted by $\Phi(z)=P(Z\leq z)=\int_{-\infty}^z \phi(t)\,dt\,.$ A random variable $$Z$$ with this c.d.f.is said to have a standard normal distribution, denoted by $$Z\sim\mathcal N(0,1)$$.

Standard Normal Random Variable

The expectation and variance of $$Z\sim\mathcal{N}(0,1)$$ are \begin{aligned} \text{E}[Z]&= \int_{-\infty}^\infty z\, \phi(z)\, dz = \int_{-\infty}^{\infty}z\,\frac1{\sqrt{2\pi}} e^{-\frac12 z^2}\,dz =0, \\ \text{Var}[Z]&=\int_{-\infty}^{\infty}z^2\, \phi(z)\, dz = 1, \\ \text{SD}[Z]&=\sqrt{\text{Var}[Z]}=\sqrt{1}=1.\end{aligned} Other quantities of interest include: \begin{aligned} \Phi(0)&=P(Z\leq 0)=\frac{1}{2},\quad \Phi(-\infty)=0,\quad \Phi(\infty)=1,\\ \Phi(1)&=P(Z\leq 1)\approx 0.8413, \quad \text{etc.}\end{aligned}

Normal Random Variables

Let $$\sigma>0$$ and $$\mu \in \mathbb{R}$$.

If $$Z\sim\mathcal{N}(0,1)$$ and $$X=\mu+\sigma Z$$, then $\frac{X-\mu}\sigma = Z \sim\mathcal{N}(0,1).$ Thus, the c.d.f.of $$X$$ is given by \begin{aligned} F_X(x)&=P( X\leq x ) \\&= P( \mu+\sigma Z\leq x ) =P\left( Z\leq \frac{x-\mu}\sigma \right)\\&= \Phi\left( \frac{x-\mu}\sigma \right)\,;\end{aligned} its p.d.f. must then be \begin{aligned} f_X(x)&=\frac d{dx}F_X(x) \\&= \frac d{dx} \Phi\left( \frac{x-\mu}\sigma \right)\\ &= \frac1\sigma\, \phi\left( \frac{x-\mu}\sigma \right).\end{aligned} Any random variable $$X$$ with this c.d.f./p.d.f.satisfies \begin{aligned} \text{E}[X]&=\mu+\sigma\text{E}[Z]=\mu,\\ \text{Var}[X]&=\sigma^2\text{Var}[Z]=\sigma^2, \\ \text{SD}[X]&=\sigma\end{aligned} and is said to be normal with mean $$\mu$$ and variance $${\sigma^2}$$, denoted by $$X\sim\mathcal{N}(\mu,\sigma^2)$$.

As it happens, every general normal $$X$$ can be obtained by a linear transformation of the standard normal $$Z$$. Traditionally, probability computations for normal distributions are done with tables which compile values of the standard normal distribution c.d.f., such as the one found in [31] (see for a preview). With the advent of freely-available statistical software, the need for tabulated values had decreased.28

In R, the standard normal c.d.f. $$F_Z(z)=P(Z\leq z)$$ can be computed with the function pnorm(z) – for instance, pnorm(0)=0.5. (In the example below, whenever $$P(Z\leq a)$$ is evaluated for some $$a$$, the value is found either by consulting a table or using pnorm.)

Examples

• Let $$Z$$ represent the standard normal random variable. Then:

1. $$P(Z\le 0.5)=0.6915$$

2. $$P(Z<-0.3)=0.3821$$

3. $$P(Z>0.5)=1-P(Z\le 0.5)=1-0.6915=0.3085$$

4. $$P(0.1<Z<0.3)=P(Z<0.3)-P(Z<0.1)=0.6179-0.5398=0.0781$$

5. $$P(-1.2<Z<0.3)=P(Z<0.3)-P(Z<-1.2)=0.5028$$

• Suppose that the waiting time (in minutes) in a coffee shop at 9am is normally distributed with mean $$5$$ and standard deviation $$0.5$$.29 What is the probability that the waiting time for a customer is at most $$6$$ minutes?

Answer: let $$X$$ denote the waiting time.

Then $$X\sim\mathcal{N}(5,0.5^2)$$ and the standardised random variable is a standard normal: $Z=\frac{X-5}{0.5} \sim\mathcal{N}(0,1)\,.$ The desired probability is \begin{aligned} P\left(X\leq6 \right) &= P\left( \frac{X-5}{0.5}\leq \frac{6-5}{0.5} \right)\\ & = P\left(Z\le \frac{6-5}{0.5} \right)= \Phi\left( \frac{6-5}{0.5} \right)\\&=\Phi(2)=P(Z\leq 2)\approx 0.9772 .\end{aligned}

• Suppose that bottles of beer are filled in such a way that the actual volume of the liquid content (in mL) varies randomly according to a normal distribution with $$\mu=376.1$$ and $$\sigma=0.4$$.30 What is the probability that the volume in any randomly selected bottle is less than $$375$$mL?

Answer: let $$X$$ denote the volume of the liquid in the bottle. Then \begin{aligned} X\sim\mathcal{N}(376.1,0.4^2)\implies Z=\frac{X-376.1}{0.4}\sim\mathcal{N}(0,1)\,.\end{aligned} The desired probability is thus \begin{aligned} P\left( X<375 \right) &= P\left( \frac{X-376.1}{0.4}<\frac{375-376.1}{0.4} \right) \\ &=P\left( Z<\frac{-1.1}{0.4} \right)\\&=P(Z\leq -2.75)=\Phi\left( -2.75 \right)\approx 0.003\,.\end{aligned}

• If $$Z\sim\mathcal{N}(0,1)$$, for which values $$a$$, $$b$$ and $$c$$ do:

1. $$P(Z\leq a)=0.95$$;

2. $$P(|Z|\le b)=P(-b\leq Z\leq b)=0.99$$;

3. $$P(|Z|\geq c)=0.01$$.

1. From the table (or R) we see that $P(Z\leq 1.64)\approx 0.9495,\ P(Z\leq 1.65)\approx 0.9505\,.$ Clearly we must have $$1.64<a<1.65$$; a linear interpolation provides a decent guess at $$a\approx1.645$$.31

2. Note that $P\left( -b\leq Z\leq b \right)=P(Z\leq b)-P(Z<-b)$ However the p.d.f. $$\phi(z)$$ is symmetric about $$z=0$$, which means that $P(Z<-b)=P(Z>b)=1-P(Z\leq b),$ and so that \begin{aligned} P\left( -b\leq Z\leq b \right)&=P(Z\leq b)-\left[ 1-P(Z\leq b) \right]\\& =2P(Z\leq b)-1\end{aligned} In the question, $$P(-b\le Z \le b)=0.99$$, so that \begin{aligned} 2P(Z\le b)-1=0.99\implies \ P(Z\leq b)=\frac{1+0.99}2 = 0.995\,.\end{aligned} Consulting the table we see that $P(Z\leq 2.57)\approx 0.9949,\ P(Z\leq 2.58)\approx 0.9951;$ a linear interpolation suggests that $$b\approx2.575$$.

3. Note that $$\left\{ |Z|\geq c \right\}=\left\{ |Z|<c \right\}^c$$, so we need to find $$c$$ such that \begin{aligned} P\left(|Z|<c \right)=1-P\left( |Z|\geq c \right) = 0.99.\end{aligned} But this is equivalent to \begin{aligned} P\left( -c<Z<c \right)=P(-c\leq Z\leq c)=0.99\end{aligned} as $$|x|<y \Leftrightarrow -y<x<y$$, and $$P(Z=c)=0$$ for all $$c$$. This problem was solved in part b); set $$c\approx 2.575$$.

Normally distributed numbers can be generated by rnorm() in R, which accepts three parameters: n, mean, and sd. The default parameter values are mean=0 and sd=1.

We can draw a single number from $$\mathcal{N}(0,1)$$ as follows:

rnorm(1) 
[1] -0.2351372

We can generate a histogram of a sample of size 500, say, from $$\mahcal{N}(0,1)$$ as follows:

z<-rnorm(500)
hist(z)

A histogram with 20 bins is shown below:

brks = seq(min(z),max(z),(max(z)-min(z))/20)
hist(z, breaks = brks)

For normal distributions with mean $$\mu$$ and standard deviation $$\sigma$$, we need to modify the call to rnorm(). For instance, we can draw 5000 observations from $$\mathcal{N}(-2,3^2)$$ using the following code:

w<-rnorm(5000, sd=3, mean=-2)
mean(w)
sd(w)
[1] -1.943782
[1] 2.920071

A histogram with 50 bins is displayed below:

brks = seq(min(w),max(w),(max(w)-min(w))/50)
hist(w, breaks = brks)

3.3.4 Exponential Distributions

Assume that cars arrive according to a Poisson process with rate $$\lambda$$, that is, the number of cars arriving within a fixed unit time period is a Poisson random variable with parameter $$\lambda$$.

Over a period of time $$x$$, we would then expect the number of arrivals $$N$$ to follow a Poisson process with parameter $$\lambda x$$. Let $$X$$ be the wait time to the first car arrival. Then $P(X>x)=1-P(X\leq x)=P(N=0)=\exp(-\lambda x).$ We say that $$X$$ follows an exponential distribution $$\text{Exp}(\lambda)$$: \begin{aligned} F_X(x)&=\begin{cases} 0 & \text{for x<0} \\ 1-e^{-\lambda x} & \text{for 0\leq x} \end{cases} \\ f_X(x)&=\begin{cases} 0 & \text{for x<0} \\ \lambda e^{-\lambda x} & \text{for 0\leq x} \end{cases} \end{aligned} Note that $$f_X(x)=F'_X(x)$$ for all $$x$$.

If $$X\sim\text{Exp}(4)$$, then $$P(X< 0.5)=F_X(0.5)=1-e^{-4(0.5)}\approx 0.865$$ is the area of the shaded region in Figure 3.16, below.

Properties

If $$X\sim\text{Exp}(\lambda)$$, then

• $$\mu=\text{E} [X]=1/\lambda$$, since \begin{aligned} \mu&=\int_{0}^{\infty}x\lambda e^{-\lambda x}\, dx=\left[-\frac{\lambda x+1}{\lambda}e^{-\lambda x}\right]_{0}^{\infty} \\ &=\left[0+\frac{\lambda(0)+1}{\lambda}e^{-0}\right]\\&=\frac{1}{\lambda}; \end{aligned}

• $$\sigma^2=\text{Var} [X]=1/\lambda^2$$, since \begin{aligned} \sigma^2&=\int_{0}^{\infty}\left(x-\text{E}[X]\right)^2\lambda e^{-\lambda x}\, dx\\&=\int_{0}^{\infty}\left(x-\frac{1}{\lambda}\right)^2\lambda e^{-\lambda x}\, dx\\&=\left[-\frac{\lambda^2 x^2+1}{\lambda^2}e^{-\lambda x}\right]_{0}^{\infty} \\ &=\left[0+\frac{\lambda^2(0)^2+1}{\lambda^2}e^{-0}\right]\\&=\frac{1}{\lambda^2}; \end{aligned}

• and $$P(X>s+t\mid X>t)=P(X>s),$$ for all $$s,t>0$$, since \begin{aligned} P(X>s+t&\mid X>t)= \frac{P(X>s+t \text{ and } X>t)}{P(X>t)} \\&=\frac{P(X>s+t)}{P(X>t)}=\frac{1-F_X(s+t)}{1-F_X(t)} \\&=\frac{\exp(-\lambda (s+t))}{\exp(-\lambda t)}\\& =\exp(-\lambda s)=P(X>s)\end{aligned} (we say that exponential distributions are memoryless).

In a sense, $$\text{Exp}(\lambda)$$ is the continuous analogue to the geometric distribution $$\text{Geo}(p)$$.

Example: the lifetime of a certain type of light bulb follows an exponential distribution whose mean is $$100$$ hours (i.e. $$\lambda=1/100$$).

• What is the probability that a light bulb will last at least $$100$$ hours?

Answer: Since $$X\sim \text{Exp}(1/100)$$, we have $P(X>100)=1-P(X\le 100)=\exp(-100/100)\approx 0.37.$

• Given that a light bulb has already been burning for $$100$$ hours, what is the probability that it will last at least $$100$$ hours more?

Answer: we seek $$P(X>200\mid X>100)$$. By the memory-less property, $P(X>200\mid X>100)=P(X>200-100)=P(X>100)\approx 0.37.$

• The manufacturer wants to guarantee that their light bulbs will last at least $$t$$ hours. What should $$t$$ be in order to ensure that $$90\%$$ of the light bulbs will last longer than $$t$$ hours?

Answer: we need to find $$t$$ such that $$P(X>t)=0.9$$. In other words, we are looking for $$t$$ such that $0.9=P(X>t)=1-P(X\leq t)=1-F_X(t)=e^{-0.01t},$ that is, $\ln 0.9 = -0.01t \Longrightarrow t=-100\ln 0.9 \approx 10.5 \text{ hours}.$

Exponentially distributed numbers are generated by rexp() in R, with required parameters n and rate.

We can draw from $$\text{Exp}(100)$$ as follows:

rexp(1,100)
[1] 0.0009430804

If we repeat the process 1000 times, the empirical mean and variance are:

q<-rexp(1000,100)
mean(q)
var(q)
[1] 0.01029523
[1] 0.000102973

And the histogram is displayed below:

hist(q)

3.3.5 Gamma Distributions

Assume that cars arrive according to a Poisson process with rate $$\lambda$$. Recall that if $$X$$ is the time to the first car arrival, then $$X\sim \text{Exp}(\lambda)$$.

If $$Y$$ is the wait time to the $$r$$th arrival, then $$Y$$ follows a Gamma distribution with parameters $$\lambda$$ and $$r$$, denoted $$Y\sim \Gamma(\lambda,r)$$, for which the p.d.f.is $f_Y(y)=\begin{cases} 0 & \text{for y<0} \\ \frac{y^{r-1}}{(r-1)!}\lambda^r e^{- \lambda y } & \text{for y\geq 0} \end{cases}$ The c.d.f. $$F_Y(y)$$ exists (it is the area under $$f_Y$$ from $$0$$ to $$y$$), but it cannot be expressed with elementary functions.

We can show that $\mu=\text{E}[Y]=\frac{r}{\lambda}\quad\mbox{and}\quad\sigma^2=\text{Var}[Y]=\frac{r}{\lambda^2}.$

Examples

• Suppose that an average of $$30$$ customers per hour arrive at a shop in accordance with a Poisson process, that is to say, $$\lambda=1/2$$ customers arrive on average every minute. What is the probability that the shopkeeper will wait more than $$5$$ minutes before both of the first two customers arrive?

Answer: let $$Y$$ denote the wait time in minutes until the second customer arrives. Then $$Y\sim \Gamma(1/2,2)$$ and \begin{aligned} P(Y>5)&=\int_{5}^{\infty}\frac{y^{2-1}}{(2-1)!}(1/2)^2e^{-y/2}\, dy\\&=\int_5^{\infty}\frac{ye^{-y/2}}{4}\, dy \\ &=\frac{1}{4}\left[-2ye^{-y/2}-4e^{-y/2}\right]_{5}^{\infty}\\&=\frac{7}{2}e^{-5/2}\approx 0.287.\end{aligned}

• Telephone calls arrive at a switchboard at a mean rate of $$\lambda=2$$ per minute, according to a Poisson process. Let $$Y$$ be the waiting time until the $$5$$th call arrives. What is the p.d.f., the mean, and the variance of $$Y$$?

Answer: we have \begin{aligned} f_Y(y)&=\frac{2^5y^4}{4!}e^{-2y}, \text{ for 0\leq y<\infty},\\ \quad \text{E}[Y]&=\frac{5}{2}, \quad \text{Var}[Y]=\frac{5}{4}.\end{aligned}

The Gamma distribution can be extended to cases where $$r>0$$ is not an integer by replacing $$(r-1)!$$ by $\Gamma(r)=\int_{0}^{\infty}t^{r-1}e^{-t}\, dt.$ The exponential and the $$\chi^2$$ distributions (we will discuss the latter later) are special cases of the Gamma distribution: $$\text{Exp}(\lambda)=\Gamma(\lambda,1)$$ and $$\chi^2(r)=\Gamma(1/2,r)$$.

Gamma distributed numbers are generated by rgamma(), with required parameters n, shape, and scale.

We can draw from a $$\Gamma(2,3)$$ distribution, for example, using:

rgamma(1,shape=2,scale=1/3)
[1] 2.249483

This can be repeated 1000 times, say, and we get the empirical mean and variance:

q<-rgamma(1000,shape=2, scale=1/3)
mean(q)
var(q)
[1] 0.6663675
[1] 0.2205931

The corresponding histogram is displayed below:

hist(q)

3.3.6 Normal Approximation of the Binomial Distribution

If $$X\sim\mathcal{B}(n,p)$$ then we may interpret $$X$$ as a sum of independent and identically distributed random variables $X=I_1+I_2+\cdots+I_n\ \text{ where each }\ I_i\sim\mathcal{B}(1,p)\,.$ Thus, according to the Central Limit Theorem (we will have more to say on the topic in a future section), for large $$n$$ we have $\frac{X-np}{\sqrt{np(1-p)}}\stackrel{\text{approx}}\sim\mathcal{N}(0,1)\,;$ for large $$n$$ if $$X\stackrel{\text{exact}}\sim\mathcal{B}(n,p)$$ then $$X\stackrel{\text{approx}}\sim\mathcal{N}(np,np(1-p))$$.

Normal Approximation with Continuity Correction

When $$X\sim \mathcal{B}(n,p)$$, we know that $$\text{E}[X]=np$$ and $$\text{Var}[X]=np(1-p)$$. If $$n$$ is large, we may approximate $$X$$ by a normal random variable in the following way: $P(X\le x)=P(X<x+0.5)=P\left(Z<\frac{x-np+0.5}{\sqrt{np(1-p)}}\right)$ and $P(X\ge x)=P(X>x-0.5)=P\left(Z>\frac{x-np-0.5}{\sqrt{np(1-p)}}\right).$ The continuity correction terms are the corresponding $$\pm 0.5$$ in the expressions (they are required).

Example: suppose $$X\sim\mathcal{B}(36,0.5)$$. Provide a normal approximation to the probability $$P(X\leq 12)$$.32

Answer: the expectation and the variance of a binomial r.v.are known: $\text{E}[X]=36(0.5)=18\quad\mbox{and}\quad \text{Var}[X]=36(0.5)(1-0.5) =9,$ and so \begin{aligned} P(X\leq12) &= P\left( \frac{X-18}{3}\leq\frac{12-18+0.5}{3}\right)\\ &\stackrel{\text{norm.approx'n}}\approx\Phi(-1.83) \stackrel{\text{table}}\approx0.033\,.\end{aligned}

Computing Binomial Probabilities

There are thus at least four ways of computing (or approximating) binomial probabilities:

• using the exact formula – if $$X\sim\mathcal{B}(n,p)$$ then for each $$x=0,1,\ldots,n$$, $$P(X=x)=\binom nxp^x(1-p)^{n-x}$$;

• using tables: if $$n\leq15$$ and $$p$$ is one of $$0.1,\ldots,0.9$$, then the corresponding c.d.f.can be found in many textbook (we must first express the desired probability in terms of the c.d.f. $$P(X\leq x)$$), such as in \begin{aligned} P(X<3)&=P(X\leq2); \\ P(X=7)&=P(X\leq7)-P(X\leq6) \,;\\ P(X>7)&=1-P(X\leq 7);\\ P(X\geq5)&=1-P(X\leq4),\, \text{ etc.}\end{aligned}

• using statistical software (pbinom() in R, say), and

• using the normal approximation when $$np$$ and $$n(1-p)$$ are both $$\geq5$$: $P(X\leq x)\approx \Phi\left( \frac{x+0.5-np}{\sqrt{np(1-p)}} \right)$ $P(X\ge x)\approx 1-\Phi\left( \frac{x-0.5-np}{\sqrt{np(1-p)}} \right).$

3.3.7 Other Continuous Distributions

Other common continuous distributions are listed in [30]:

• the Beta distribution, a family of 2-parameter distributions with one mode and which is useful to estimate success probabilities (special cases: uniform, arcsine, PERT distributions);

• the logit-normal distribution on $$(0,1)$$, which is used to model proportions;

• the Kumaraswamy distribution, which is used in simulations in lieu of the Beta distribution (as it has a closed form c.d.f.);

• the triangular distribution, which is typically used as a subjective description of a population for which there is only limited sample data (it is based on a knowledge of the minimum and maximum and a guess of the mode);

• the chi-squared distribution, which is the sum of the squares of $$n$$ independent normal random variables, is used in goodness-of-fit tests in statistics;

• the $$F-$$distribution, which is the ratio of two chi-squared random variables, used in the analysis of variance;

• the Erlang distribution is the distribution of the sum of $$k$$ independent and identically distributed exponential random variables, and it is used in queueing models (it is a special case of the Gammma distribution);

• the Pareto distribution, which is used to describe financial data and critical behavior;

• Student’s $$T$$ statistic, which arise when estimating the mean of a normally-distributed population in situations where the sample size is small and the population’s standard deviation is unknown;

• the logistic distribution, whose cumulative distribution function is the logistic function;

• the log-normal distribution, which describing variables that are the product of many small independent positive variables;

• etc.

References

[30]
Wikipedia, 2021.
[31]
R. E. Walpole, R. H. Myers, S. L. Myers, and K. Ye, Probability and Statistics for Engineers and Scientists, 8th ed. Pearson Education, 2007.