3.3 Continuous Distributions

How do we approach probabilities where there there are uncountably infinitely many possible outcomes, such as one might encounter if \(X\) represents the height of an individual in the population, for instance (e.g., the outcomes reside in a continuous interval)? What is the probability that a randomly selected person is about \(6\) feet tall, say?

3.3.1 Continuous Random Variables

In the discrete case, the probability mass function \[f_X(x)=P(X=x)\] was the main object of interest. In the continuous case, the analogous role is played by the probability density function (p.d.f.), still denoted by \(f_X(x)\), but there is a major difference with discrete r.v.: \[f_X(x) \neq P(X=x).\] The (cumulative) distribution function (c.d.f.) of any such random variable \(X\) is also still defined by \[F_X(x)=P(X\leq x)\,,\] viewed as a function of a real variable \(x\); however \(P(X\leq x)\) is not simply computed by adding a few terms of the form \(P(X=x_i)\).

Note as well that \[\lim_{x\to -\infty}F_X(x)=0\quad\mbox{and}\quad\lim_{x\to +\infty}F_X(x)=1.\] We can describe the distribution of the random variable \(X\) via the following relationship between \(f_X(x)\) and \(F_X(x)\): \[f_X(x)=\frac{d}{dx}F_X(x);\] in the continuous case, probability theory is simply an application of calculus!

Area Under the Curve

For any \(a<b\), we have \[\left\{ X\leq b \right\} = \left\{ X\leq a \right\}\cup \left\{ a<X\leq b \right\},\] so that \[\begin{aligned} P\left( X\leq a \right)+P\left( a<X\leq b \right) &= P\left( X\leq b \right)\end{aligned}\] and thus \[\begin{aligned} P\left( a<X\leq b \right)&= P\left( X\leq b \right)- P\left( X\leq a \right)\\ &= F_X(b)-F_X(a)=\int_a^b f_X(x)\, dx\end{aligned}\]

Probability Density Function

The probability density function (p.d.f.) of a continuous random variable \(X\) is an integrable function \(f_X: X(\mathcal{S})\to \mathbb{R}\) such that:

  • \(f_X(x)>0\) for all \(x\in X(\mathcal{S})\) and \(\displaystyle{\lim_{x\to \pm \infty}f_X(x)=0}\);

  • \(\int_{\mathcal{S}}f_X(x)\, dx=1\);

  • for any event \(A=(a,b)=\{X|a<X<b\}\), \[P(A)=P((a,b))=\int_a^b f_X(x)\, dx,\]

and the cumulative distribution function (c.d.f.) \(F_X\) is given by \[F_X(x)=P(X\leq x)=\int_{-\infty}^xf_X(t)\, dt.\] Unlike discrete distributions, the absence or presence of endpoints does not affect the probability computations for continuous distributions: for any \(a,b\), \[P(a<X<b)=P(a\leq X<b)=P(a<X\leq b)=P(a\leq X\leq b),\] all taking the value \[F_X(b)-F_X(a)=\int_a^bf(x)\, dx.\] Furthermore, for any \(x\), \[P(x> X) = 1-P(X\leq x)=1-F_X(x)=1-\int_{-\infty}^xf_X(t)\, dt;\] and for any \(a\), \[P\left( X=a \right)= P\left( a\leq X\leq a \right)= \int_a^{a} f_X(x)\,dx=0.\] That last result explains why it is pointless to speak of the probability of a random variable taking on a specific value in the continuous case; rather, we are interested in ranges of values.


  • Assume that \(X\) has the following p.d.f.: \[f_X(x)=\begin{cases}0 & \text{if $x<0$} \\ x/2 & \text{if $0\leq x\leq 2$} \\ 0 & \text{if $x>2$}\end{cases}\] Note that \(\int_{0}^2f(x)\, dx =1.\) The corresponding c.d.f.is given by: \[\begin{aligned} F_X&(x)=P(X\leq x)=\int_{-\infty}^x f_X(t)\, dt \\ &=\begin{cases} 0 & \text{if $x<0$} \\ 1/2\cdot \int_{0}^x t\, dt = x^2/4 & \text{if $0<x<2$} \\ 1 & \text{if $x\geq 2$} \end{cases}\end{aligned}\]

    The p.d.f. and the c.d.f. for this r.v. are shown in Figure 3.8.

    P.d.f. and c.d.f. for the        r.v. $X$ defined above.

    Figure 3.8: P.d.f. and c.d.f. for the r.v. \(X\) defined above.

  • What is the probability of the event \[A=\{X|0.5<X<1.5\}?\]

    Answer: we need to evaluate \[\begin{aligned} P(A)&=P(0.5<X<1.5)=F_X(1.5)-F_X(0.5)\\ &=\frac{(1.5)^2}{4}-\frac{(0.5)^2}{4} =\frac{1}{2}. \end{aligned}\]

    P.d.f. and c.d.f. for the r.v. $X$ defined above, with appropriate region.

    Figure 3.9: P.d.f. and c.d.f. for the r.v. \(X\) defined above, with appropriate region.

  • What is the probability of the event \(B=\{X|X=1\}\)?

    Answer: we need to evaluate \[P(B) = P(X=1)=P(1\leq X\leq 1)=F_{X}(1)-F_X(1)=0.\] This is not unexpected: even though \(f_X(1)=0.5\neq 0\), \(P(X=1)=0\), as we saw earlier.

  • Assume that, for \(\lambda>0\), \(X\) has the following p.d.f.: \[f_X(x)=\begin{cases} \lambda\exp(-\lambda x) & \text{if $x\geq 0$}\\ 0&\text{if $x<0$} \end{cases}\] Verify that \(f_X\) is a p.d.f.for all \(\lambda>0\), and compute the probability that \(X>10.2\).

    Answer: that \(f_X\) is a p.d.f.is obvious; the only work goes into showing that \[\begin{aligned} \int_{-\infty}^{\infty}&f(x)\, dx =\int_{0}^{\infty}\lambda\exp(-\lambda x)\, dx\\&=\lim_{b\to\infty}\int_{0}^{b}\lambda\exp(-\lambda x)\, dx\\&=\lim_{b\to\infty}\lambda \left[\frac{\exp(-\lambda x)}{-\lambda}\right]_0^b =\lim_{b\to\infty}\left[-\exp(-\lambda x)\right]_0^b\\ &=\lim_{b\to\infty}\left[-\exp(-\lambda b)+\exp(0)\right]=1.\end{aligned}\] The corresponding c.d.f.is given by: \[\begin{aligned} F_X(x;\lambda)&=P_{\lambda}(X\leq x)=\int_{-\infty}^{x}f_X(t)\, dt\\&=\begin{cases} 0 & \text{if $x<0$} \\ \lambda\int_0^x\exp(-\lambda t)\, dt & \text{if $x\geq 0$}\end{cases} \\ & = \begin{cases} 0 & \text{if $x<0$} \\ [-\exp(-\lambda t)]_0^x & \text{if $x\geq 0$} \end{cases} \\ &= \begin{cases} 0 & \text{if $x<0$} \\ 1-\exp(-\lambda x) & \text{if $x\geq 0$} \end{cases}\end{aligned}\] Then \[\begin{aligned} P_{\lambda}(X>10.2)&=1-F_X(10.2;\lambda)=1-[1-\exp(-10.2\lambda)]\\&=\exp(-10.2\lambda)\end{aligned}\] is a function of the distribution parameter \(\lambda\) itself:

    \(\lambda\) \(P_{\lambda}(X>10.2)\)
    \(0.002\) \(0.9798\)
    \(0.02\) \(0.8155\)
    \(0.2\) \(0.1300\)
    \(2\) \(1.38 \times 10^{-9}\)
    \(20\) \(2.54 \times 10^{-89}\)
    \(200\) \(0\) (for all intents and purposes)

    For \(\lambda=0.2\), for instance, the p.d.f.and c.d.f.are:

    P.d.f. and c.d.f. for the r.v. $X$ defined above, with with $\lambda=0.2$.

    Figure 3.10: P.d.f. and c.d.f. for the r.v. \(X\) defined above, with \(\lambda=0.2\).

    The probability that \(X>10.2\) is the area (to \(\infty\)) in blue, below.

    Probability of $X>10.2$, for $X$ defined above, with $\lambda=0.2$.

    Figure 3.11: Probability of \(X>10.2\), for \(X\) defined above, with \(\lambda=0.2\).

    For \(\lambda=2\), the probability is so small (\(1.38\times 10^{-9}\)) that it does not even appear in the p.d.f. (see below).

    Probability of $X>10.2$, for $X$ defined above, with $\lambda=2$.

    Figure 3.12: Probability of \(X>10.2\), for \(X\) defined above, with \(\lambda=2\).

    Note that in all cases, the shape of the p.d.f.and the c.d.fare the same (the spike when \(\lambda=2\) is much higher than that when \(\lambda=0.2\) – why must that be the case?). This is not a general property of distributions, however, but a property of this specific family of distributions.

3.3.2 Expectation of a Continuous Random Variable

For a continuous random variable \(X\) with p.d.f. \(f_X(x)\), the expectation of \(X\) is defined as \[\text{E}[X]=\int_{-\infty}^\infty x f_X(x)\,dx\,.\] For any function \(h(X)\), we can also define \[\text{E}\left[ h(X) \right] = \int_{-\infty}^\infty h(x) f_X(x)\,dx\,.\]


  • Find \(\text{E}[X]\) and \(\text{E}[X^2]\) in the first example, above.

    Answer: we need to evaluate \[\begin{aligned} \text{E} [X]&=\int_{-\infty}^{\infty}xf_X(X)\, dx=\int_0^2xf_X(x)\,dx \\ &=\int_0^2\frac{x^2}{2}\, dx = \left[\frac{x^3}{6}\right]_{x=0}^{x=2}=\frac{4}{3};\\ \text{E}[X^2]&=\int_0^2\frac{x^3}{2}\, dx=2.\end{aligned}\]

  • Note that the expectation need not exist! Compute the expectation of the random variable \(X\) with p.d.f. \[f_X(x)=\frac{1}{\pi(1+x^2)}, \quad-\infty<x<\infty.\]

    Answer: let’s verify that \(f_X(x)\) is indeed a p.d.f.: \[\begin{aligned} \int_{-\infty}^{\infty}f_X(x)\, dx&= \frac{1}{\pi}\int_{-\infty}^{\infty}\frac{1}{1+x^2}\, dx \\&= \frac{1}{\pi}[\arctan(x)]^{\infty}_{-\infty}=\frac{1}{\pi}\left[\frac{\pi}{2}+\frac{\pi}{2}\right]=1. \end{aligned}\]

    P.d.f. and c.d.f. for the Cauchy distribution.

    Figure 3.13: P.d.f. and c.d.f. for the Cauchy distribution.

    We can also easily see that \[\begin{aligned} F_X(x)&=P(X\leq x)=\int_{-\infty}^xf_X(t)\, dt\\& =\frac{1}{\pi}\int_{-\infty}^x\frac{1}{1+t^2}\, dt=\frac{1}{\pi}\arctan(x)+\frac{1}{2}.\end{aligned}\] For instance, \(P(X\leq 3)=\frac{1}{\pi}\arctan(3)+\frac{1}{2}\), say.

    P.d.f.     and c.d.f. for the Cauchy distribution, with area under the curve.

    Figure 3.14: P.d.f. and c.d.f. for the Cauchy distribution, with area under the curve.

    The expectation of \(X\) is \[\begin{aligned} \text{E}[X]&=\int_{-\infty}^{\infty}xf_X(x)\, dx = \int_{-\infty}^{\infty}\frac{x}{\pi(1+x^2)}\, dx.\end{aligned}\] If this improper integral exists, then it needs to be equal both to \[\underbrace{\int_{-\infty}^0\frac{x}{\pi(1+x^2)}\, dx + \int_0^{\infty}\frac{x}{\pi(1+x^2)}\, dx}_{\text{candidate $1$}}\] and to the Cauchy principal value \[\underbrace{\lim_{a\to\infty}\int_{-a}^a\frac{x}{\pi(1+x^2)}\, dx}_{\text{candidate $2$}}.\] But it is straightforward to find an antiderivative of \(\frac{x}{\pi(1+x^2)}.\) Set \(u=1+x^2\). Then \(du=2xdx\) and \(xdx=\frac{du}{2}\), and we obtain \[\int \frac{x}{\pi(1+x^2)}\, dx=\frac{1}{2\pi}\int u\, du=\frac{1}{2\pi}\ln|u|=\frac{1}{2\pi}\ln(1+x^2).\] Then the candidate \(2\) integral reduces to \[\begin{aligned} \lim_{a\to\infty}\left[\frac{\ln(1+x^2)}{2\pi}\right]_{-a}^a&=\lim_{a\to\infty}\left[\frac{\ln(1+a^2)}{2\pi}-\frac{\ln(1+(-a)^2)}{2\pi}\right] \\ &=\lim_{a\to\infty}0=0;\end{aligned}\] while the candidate \(1\) integral reduces to \[\left[\frac{\ln(1+x^2)}{2\pi}\right]^0_{-\infty}+\left[\frac{\ln(1+x^2)}{2\pi}\right]^{\infty}_0 =0-(\infty)+\infty-0=\infty-\infty\] which is undefined. Thus \(\text{E}[X]\) cannot not exist, as it would have to be both equal to \(0\) and be undefined simultaneously.

Mean and Variance

In a similar way to the discrete case, the mean of \(X\) is defined to be \(\text{E}[X]\), and the \(\textbf{variance}\) and standard deviation of \(X\) are, as before, \[\begin{aligned} \text{Var}[X]&\stackrel{\text{def}}= \text{E}\left[( X-\text{E}[X])^2 \right] =\text{E}[X^2]- \text{E}^2[X]\,, \\ \text{SD}[X]&=\sqrt{\text{Var}[X]}\,.\end{aligned}\] As in the discrete case, if \(X,Y\) are continuous random variables, and \(a,b\in\mathbb{R}\), then \[\begin{aligned} \text{E}[aY+bX]&= a\text{E}[Y]+b\text{E}[X]\\ \text{Var}[ a+bX ]&= b^2\text{Var}[X]\\ \text{SD}[ a+bX ]&=|b|\text{SD}[X]\end{aligned}\] The interpretations of the mean as a measure of centrality and of the variance as a measure of dispersion are unchanged in the continuous case.

For the time being, however, we cannot easily compute the variance of a sum \(X+Y\), unless \(X\) and \(Y\) are independent random variables, in which case \[\text{Var}[X+Y]= \text{Var}[X]+\text{Var}[Y].\]

3.3.3 Normal Distributions

A very important example of a continuous distribution is that provided by the special probability distribution function \[\phi(z)=\frac1{\sqrt{2\pi}}e^{-z^2/2}\,.\] The corresponding cumulative distribution function is denoted by \[\Phi(z)=P(Z\leq z)=\int_{-\infty}^z \phi(t)\,dt\,.\] A random variable \(Z\) with this c.d.f.is said to have a standard normal distribution, denoted by \(Z\sim\mathcal N(0,1)\).

P.d.f. and c.d.f. for the standard normal distribution.P.d.f. and c.d.f. for the standard normal distribution.

Figure 3.15: P.d.f. and c.d.f. for the standard normal distribution.

Standard Normal Random Variable

The expectation and variance of \(Z\sim\mathcal{N}(0,1)\) are \[\begin{aligned} \text{E}[Z]&= \int_{-\infty}^\infty z\, \phi(z)\, dz = \int_{-\infty}^{\infty}z\,\frac1{\sqrt{2\pi}} e^{-\frac12 z^2}\,dz =0, \\ \text{Var}[Z]&=\int_{-\infty}^{\infty}z^2\, \phi(z)\, dz = 1, \\ \text{SD}[Z]&=\sqrt{\text{Var}[Z]}=\sqrt{1}=1.\end{aligned}\] Other quantities of interest include: \[\begin{aligned} \Phi(0)&=P(Z\leq 0)=\frac{1}{2},\quad \Phi(-\infty)=0,\quad \Phi(\infty)=1,\\ \Phi(1)&=P(Z\leq 1)\approx 0.8413, \quad \text{etc.}\end{aligned}\]

Normal Random Variables

Let \(\sigma>0\) and \(\mu \in \mathbb{R}\).

If \(Z\sim\mathcal{N}(0,1)\) and \(X=\mu+\sigma Z\), then \[\frac{X-\mu}\sigma = Z \sim\mathcal{N}(0,1).\] Thus, the c.d.f.of \(X\) is given by \[\begin{aligned} F_X(x)&=P( X\leq x ) \\&= P( \mu+\sigma Z\leq x ) =P\left( Z\leq \frac{x-\mu}\sigma \right)\\&= \Phi\left( \frac{x-\mu}\sigma \right)\,;\end{aligned}\] its p.d.f. must then be \[\begin{aligned} f_X(x)&=\frac d{dx}F_X(x) \\&= \frac d{dx} \Phi\left( \frac{x-\mu}\sigma \right)\\ &= \frac1\sigma\, \phi\left( \frac{x-\mu}\sigma \right).\end{aligned}\] Any random variable \(X\) with this c.d.f./p.d.f.satisfies \[\begin{aligned} \text{E}[X]&=\mu+\sigma\text{E}[Z]=\mu,\\ \text{Var}[X]&=\sigma^2\text{Var}[Z]=\sigma^2, \\ \text{SD}[X]&=\sigma\end{aligned}\] and is said to be normal with mean \(\mu\) and variance \({\sigma^2}\), denoted by \(X\sim\mathcal{N}(\mu,\sigma^2)\).

As it happens, every general normal \(X\) can be obtained by a linear transformation of the standard normal \(Z\). Traditionally, probability computations for normal distributions are done with tables which compile values of the standard normal distribution c.d.f., such as the one found in [31] (see for a preview). With the advent of freely-available statistical software, the need for tabulated values had decreased.28

In R, the standard normal c.d.f. \(F_Z(z)=P(Z\leq z)\) can be computed with the function pnorm(z) – for instance, pnorm(0)=0.5. (In the example below, whenever \(P(Z\leq a)\) is evaluated for some \(a\), the value is found either by consulting a table or using pnorm.)


  • Let \(Z\) represent the standard normal random variable. Then:

    1. \(P(Z\le 0.5)=0.6915\)

    2. \(P(Z<-0.3)=0.3821\)

    3. \(P(Z>0.5)=1-P(Z\le 0.5)=1-0.6915=0.3085\)

    4. \(P(0.1<Z<0.3)=P(Z<0.3)-P(Z<0.1)=0.6179-0.5398=0.0781\)

    5. \(P(-1.2<Z<0.3)=P(Z<0.3)-P(Z<-1.2)=0.5028\)

  • Suppose that the waiting time (in minutes) in a coffee shop at 9am is normally distributed with mean \(5\) and standard deviation \(0.5\).29 What is the probability that the waiting time for a customer is at most \(6\) minutes?

    Answer: let \(X\) denote the waiting time.

    Then \(X\sim\mathcal{N}(5,0.5^2)\) and the standardised random variable is a standard normal: \[Z=\frac{X-5}{0.5} \sim\mathcal{N}(0,1)\,.\] The desired probability is \[\begin{aligned} P\left(X\leq6 \right) &= P\left( \frac{X-5}{0.5}\leq \frac{6-5}{0.5} \right)\\ & = P\left(Z\le \frac{6-5}{0.5} \right)= \Phi\left( \frac{6-5}{0.5} \right)\\&=\Phi(2)=P(Z\leq 2)\approx 0.9772 .\end{aligned}\]

  • Suppose that bottles of beer are filled in such a way that the actual volume of the liquid content (in mL) varies randomly according to a normal distribution with \(\mu=376.1\) and \(\sigma=0.4\).30 What is the probability that the volume in any randomly selected bottle is less than \(375\)mL?

    Answer: let \(X\) denote the volume of the liquid in the bottle. Then \[\begin{aligned} X\sim\mathcal{N}(376.1,0.4^2)\implies Z=\frac{X-376.1}{0.4}\sim\mathcal{N}(0,1)\,.\end{aligned}\] The desired probability is thus \[\begin{aligned} P\left( X<375 \right) &= P\left( \frac{X-376.1}{0.4}<\frac{375-376.1}{0.4} \right) \\ &=P\left( Z<\frac{-1.1}{0.4} \right)\\&=P(Z\leq -2.75)=\Phi\left( -2.75 \right)\approx 0.003\,.\end{aligned}\]

  • If \(Z\sim\mathcal{N}(0,1)\), for which values \(a\), \(b\) and \(c\) do:

    1. \(P(Z\leq a)=0.95\);

    2. \(P(|Z|\le b)=P(-b\leq Z\leq b)=0.99\);

    3. \(P(|Z|\geq c)=0.01\).


    1. From the table (or R) we see that \[P(Z\leq 1.64)\approx 0.9495,\ P(Z\leq 1.65)\approx 0.9505\,.\] Clearly we must have \(1.64<a<1.65\); a linear interpolation provides a decent guess at \(a\approx1.645\).31

    2. Note that \[P\left( -b\leq Z\leq b \right)=P(Z\leq b)-P(Z<-b)\] However the p.d.f. \(\phi(z)\) is symmetric about \(z=0\), which means that \[P(Z<-b)=P(Z>b)=1-P(Z\leq b),\] and so that \[\begin{aligned} P\left( -b\leq Z\leq b \right)&=P(Z\leq b)-\left[ 1-P(Z\leq b) \right]\\& =2P(Z\leq b)-1\end{aligned}\] In the question, \(P(-b\le Z \le b)=0.99\), so that \[\begin{aligned} 2P(Z\le b)-1=0.99\implies \ P(Z\leq b)=\frac{1+0.99}2 = 0.995\,.\end{aligned}\] Consulting the table we see that \[P(Z\leq 2.57)\approx 0.9949,\ P(Z\leq 2.58)\approx 0.9951;\] a linear interpolation suggests that \(b\approx2.575\).

    3. Note that \(\left\{ |Z|\geq c \right\}=\left\{ |Z|<c \right\}^c\), so we need to find \(c\) such that \[\begin{aligned} P\left(|Z|<c \right)=1-P\left( |Z|\geq c \right) = 0.99.\end{aligned}\] But this is equivalent to \[\begin{aligned} P\left( -c<Z<c \right)=P(-c\leq Z\leq c)=0.99\end{aligned}\] as \(|x|<y \Leftrightarrow -y<x<y\), and \(P(Z=c)=0\) for all \(c\). This problem was solved in part b); set \(c\approx 2.575\).

Normally distributed numbers can be generated by rnorm() in R, which accepts three parameters: n, mean, and sd. The default parameter values are mean=0 and sd=1.

We can draw a single number from \(\mathcal{N}(0,1)\) as follows:

[1] -0.2351372

We can generate a histogram of a sample of size 500, say, from \(\mahcal{N}(0,1)\) as follows:


A histogram with 20 bins is shown below:

brks = seq(min(z),max(z),(max(z)-min(z))/20) 
hist(z, breaks = brks)

For normal distributions with mean \(\mu\) and standard deviation \(\sigma\), we need to modify the call to rnorm(). For instance, we can draw 5000 observations from \(\mathcal{N}(-2,3^2)\) using the following code:

w<-rnorm(5000, sd=3, mean=-2)
[1] -1.943782
[1] 2.920071

A histogram with 50 bins is displayed below:

brks = seq(min(w),max(w),(max(w)-min(w))/50) 
hist(w, breaks = brks)

3.3.4 Exponential Distributions

Assume that cars arrive according to a Poisson process with rate \(\lambda\), that is, the number of cars arriving within a fixed unit time period is a Poisson random variable with parameter \(\lambda\).

Over a period of time \(x\), we would then expect the number of arrivals \(N\) to follow a Poisson process with parameter \(\lambda x\). Let \(X\) be the wait time to the first car arrival. Then \[P(X>x)=1-P(X\leq x)=P(N=0)=\exp(-\lambda x).\] We say that \(X\) follows an exponential distribution \(\text{Exp}(\lambda)\): \[\begin{aligned} F_X(x)&=\begin{cases} 0 & \text{for $x<0$} \\ 1-e^{-\lambda x} & \text{for $0\leq x$} \end{cases} \\ f_X(x)&=\begin{cases} 0 & \text{for $x<0$} \\ \lambda e^{-\lambda x} & \text{for $0\leq x$} \end{cases} \end{aligned}\] Note that \(f_X(x)=F'_X(x)\) for all \(x\).

If \(X\sim\text{Exp}(4)\), then \(P(X< 0.5)=F_X(0.5)=1-e^{-4(0.5)}\approx 0.865\) is the area of the shaded region in Figure 3.16, below.

P.d.f. and c.d.f. for the exponential distribution. with parameter $\lambda=4$.

Figure 3.16: P.d.f. and c.d.f. for the exponential distribution. with parameter \(\lambda=4\) [source unknown].


If \(X\sim\text{Exp}(\lambda)\), then

  • \(\mu=\text{E} [X]=1/\lambda\), since \[\begin{aligned} \mu&=\int_{0}^{\infty}x\lambda e^{-\lambda x}\, dx=\left[-\frac{\lambda x+1}{\lambda}e^{-\lambda x}\right]_{0}^{\infty} \\ &=\left[0+\frac{\lambda(0)+1}{\lambda}e^{-0}\right]\\&=\frac{1}{\lambda}; \end{aligned}\]

  • \(\sigma^2=\text{Var} [X]=1/\lambda^2\), since \[\begin{aligned} \sigma^2&=\int_{0}^{\infty}\left(x-\text{E}[X]\right)^2\lambda e^{-\lambda x}\, dx\\&=\int_{0}^{\infty}\left(x-\frac{1}{\lambda}\right)^2\lambda e^{-\lambda x}\, dx\\&=\left[-\frac{\lambda^2 x^2+1}{\lambda^2}e^{-\lambda x}\right]_{0}^{\infty} \\ &=\left[0+\frac{\lambda^2(0)^2+1}{\lambda^2}e^{-0}\right]\\&=\frac{1}{\lambda^2}; \end{aligned}\]

  • and \(P(X>s+t\mid X>t)=P(X>s),\) for all \(s,t>0\), since \[\begin{aligned} P(X>s+t&\mid X>t)= \frac{P(X>s+t \text{ and } X>t)}{P(X>t)} \\&=\frac{P(X>s+t)}{P(X>t)}=\frac{1-F_X(s+t)}{1-F_X(t)} \\&=\frac{\exp(-\lambda (s+t))}{\exp(-\lambda t)}\\& =\exp(-\lambda s)=P(X>s)\end{aligned}\] (we say that exponential distributions are memoryless).

In a sense, \(\text{Exp}(\lambda)\) is the continuous analogue to the geometric distribution \(\text{Geo}(p)\).

Example: the lifetime of a certain type of light bulb follows an exponential distribution whose mean is \(100\) hours (i.e. \(\lambda=1/100\)).

  • What is the probability that a light bulb will last at least \(100\) hours?

    Answer: Since \(X\sim \text{Exp}(1/100)\), we have \[P(X>100)=1-P(X\le 100)=\exp(-100/100)\approx 0.37.\]

  • Given that a light bulb has already been burning for \(100\) hours, what is the probability that it will last at least \(100\) hours more?

    Answer: we seek \(P(X>200\mid X>100)\). By the memory-less property, \[P(X>200\mid X>100)=P(X>200-100)=P(X>100)\approx 0.37.\]

  • The manufacturer wants to guarantee that their light bulbs will last at least \(t\) hours. What should \(t\) be in order to ensure that \(90\%\) of the light bulbs will last longer than \(t\) hours?

    Answer: we need to find \(t\) such that \(P(X>t)=0.9\). In other words, we are looking for \(t\) such that \[0.9=P(X>t)=1-P(X\leq t)=1-F_X(t)=e^{-0.01t},\] that is, \[\ln 0.9 = -0.01t \Longrightarrow t=-100\ln 0.9 \approx 10.5 \text{ hours}.\]

Exponentially distributed numbers are generated by rexp() in R, with required parameters n and rate.

We can draw from \(\text{Exp}(100)\) as follows:

[1] 0.0009430804

If we repeat the process 1000 times, the empirical mean and variance are:

[1] 0.01029523
[1] 0.000102973

And the histogram is displayed below:


3.3.5 Gamma Distributions

Assume that cars arrive according to a Poisson process with rate \(\lambda\). Recall that if \(X\) is the time to the first car arrival, then \(X\sim \text{Exp}(\lambda)\).

If \(Y\) is the wait time to the \(r\)th arrival, then \(Y\) follows a Gamma distribution with parameters \(\lambda\) and \(r\), denoted \(Y\sim \Gamma(\lambda,r)\), for which the p.d.f.is \[f_Y(y)=\begin{cases} 0 & \text{for $y<0$} \\ \frac{y^{r-1}}{(r-1)!}\lambda^r e^{- \lambda y } & \text{for $y\geq 0$} \end{cases}\] The c.d.f. \(F_Y(y)\) exists (it is the area under \(f_Y\) from \(0\) to \(y\)), but it cannot be expressed with elementary functions.

We can show that \[\mu=\text{E}[Y]=\frac{r}{\lambda}\quad\mbox{and}\quad\sigma^2=\text{Var}[Y]=\frac{r}{\lambda^2}.\]


  • Suppose that an average of \(30\) customers per hour arrive at a shop in accordance with a Poisson process, that is to say, \(\lambda=1/2\) customers arrive on average every minute. What is the probability that the shopkeeper will wait more than \(5\) minutes before both of the first two customers arrive?

    Answer: let \(Y\) denote the wait time in minutes until the second customer arrives. Then \(Y\sim \Gamma(1/2,2)\) and \[\begin{aligned} P(Y>5)&=\int_{5}^{\infty}\frac{y^{2-1}}{(2-1)!}(1/2)^2e^{-y/2}\, dy\\&=\int_5^{\infty}\frac{ye^{-y/2}}{4}\, dy \\ &=\frac{1}{4}\left[-2ye^{-y/2}-4e^{-y/2}\right]_{5}^{\infty}\\&=\frac{7}{2}e^{-5/2}\approx 0.287.\end{aligned}\]

  • Telephone calls arrive at a switchboard at a mean rate of \(\lambda=2\) per minute, according to a Poisson process. Let \(Y\) be the waiting time until the \(5\)th call arrives. What is the p.d.f., the mean, and the variance of \(Y\)?

    Answer: we have \[\begin{aligned} f_Y(y)&=\frac{2^5y^4}{4!}e^{-2y}, \text{ for $0\leq y<\infty$},\\ \quad \text{E}[Y]&=\frac{5}{2}, \quad \text{Var}[Y]=\frac{5}{4}.\end{aligned}\]

The Gamma distribution can be extended to cases where \(r>0\) is not an integer by replacing \((r-1)!\) by \[\Gamma(r)=\int_{0}^{\infty}t^{r-1}e^{-t}\, dt.\] The exponential and the \(\chi^2\) distributions (we will discuss the latter later) are special cases of the Gamma distribution: \(\text{Exp}(\lambda)=\Gamma(\lambda,1)\) and \(\chi^2(r)=\Gamma(1/2,r)\).

Gamma distributed numbers are generated by rgamma(), with required parameters n, shape, and scale.

We can draw from a \(\Gamma(2,3)\) distribution, for example, using:

[1] 2.249483

This can be repeated 1000 times, say, and we get the empirical mean and variance:

q<-rgamma(1000,shape=2, scale=1/3)
[1] 0.6663675
[1] 0.2205931

The corresponding histogram is displayed below:


3.3.6 Normal Approximation of the Binomial Distribution

If \(X\sim\mathcal{B}(n,p)\) then we may interpret \(X\) as a sum of independent and identically distributed random variables \[X=I_1+I_2+\cdots+I_n\ \text{ where each }\ I_i\sim\mathcal{B}(1,p)\,.\] Thus, according to the Central Limit Theorem (we will have more to say on the topic in a future section), for large \(n\) we have \[\frac{X-np}{\sqrt{np(1-p)}}\stackrel{\text{approx}}\sim\mathcal{N}(0,1)\,;\] for large \(n\) if \(X\stackrel{\text{exact}}\sim\mathcal{B}(n,p)\) then \(X\stackrel{\text{approx}}\sim\mathcal{N}(np,np(1-p))\).

Normal Approximation with Continuity Correction

When \(X\sim \mathcal{B}(n,p)\), we know that \(\text{E}[X]=np\) and \(\text{Var}[X]=np(1-p)\). If \(n\) is large, we may approximate \(X\) by a normal random variable in the following way: \[P(X\le x)=P(X<x+0.5)=P\left(Z<\frac{x-np+0.5}{\sqrt{np(1-p)}}\right)\] and \[P(X\ge x)=P(X>x-0.5)=P\left(Z>\frac{x-np-0.5}{\sqrt{np(1-p)}}\right).\] The continuity correction terms are the corresponding \(\pm 0.5\) in the expressions (they are required).

Example: suppose \(X\sim\mathcal{B}(36,0.5)\). Provide a normal approximation to the probability \(P(X\leq 12)\).32

Answer: the expectation and the variance of a binomial r.v.are known: \[\text{E}[X]=36(0.5)=18\quad\mbox{and}\quad \text{Var}[X]=36(0.5)(1-0.5) =9,\] and so \[\begin{aligned} P(X\leq12) &= P\left( \frac{X-18}{3}\leq\frac{12-18+0.5}{3}\right)\\ &\stackrel{\text{norm.approx'n}}\approx\Phi(-1.83) \stackrel{\text{table}}\approx0.033\,.\end{aligned}\]

Computing Binomial Probabilities

There are thus at least four ways of computing (or approximating) binomial probabilities:

  • using the exact formula – if \(X\sim\mathcal{B}(n,p)\) then for each \(x=0,1,\ldots,n\), \(P(X=x)=\binom nxp^x(1-p)^{n-x}\);

  • using tables: if \(n\leq15\) and \(p\) is one of \(0.1,\ldots,0.9\), then the corresponding c.d.f.can be found in many textbook (we must first express the desired probability in terms of the c.d.f. \(P(X\leq x)\)), such as in \[\begin{aligned} P(X<3)&=P(X\leq2); \\ P(X=7)&=P(X\leq7)-P(X\leq6) \,;\\ P(X>7)&=1-P(X\leq 7);\\ P(X\geq5)&=1-P(X\leq4),\, \text{ etc.}\end{aligned}\]

  • using statistical software (pbinom() in R, say), and

  • using the normal approximation when \(np\) and \(n(1-p)\) are both \(\geq5\): \[P(X\leq x)\approx \Phi\left( \frac{x+0.5-np}{\sqrt{np(1-p)}} \right)\] \[P(X\ge x)\approx 1-\Phi\left( \frac{x-0.5-np}{\sqrt{np(1-p)}} \right).\]

3.3.7 Other Continuous Distributions

Other common continuous distributions are listed in [30]:

  • the Beta distribution, a family of 2-parameter distributions with one mode and which is useful to estimate success probabilities (special cases: uniform, arcsine, PERT distributions);

  • the logit-normal distribution on \((0,1)\), which is used to model proportions;

  • the Kumaraswamy distribution, which is used in simulations in lieu of the Beta distribution (as it has a closed form c.d.f.);

  • the triangular distribution, which is typically used as a subjective description of a population for which there is only limited sample data (it is based on a knowledge of the minimum and maximum and a guess of the mode);

  • the chi-squared distribution, which is the sum of the squares of \(n\) independent normal random variables, is used in goodness-of-fit tests in statistics;

  • the \(F-\)distribution, which is the ratio of two chi-squared random variables, used in the analysis of variance;

  • the Erlang distribution is the distribution of the sum of \(k\) independent and identically distributed exponential random variables, and it is used in queueing models (it is a special case of the Gammma distribution);

  • the Pareto distribution, which is used to describe financial data and critical behavior;

  • Student’s \(T\) statistic, which arise when estimating the mean of a normally-distributed population in situations where the sample size is small and the population’s standard deviation is unknown;

  • the logistic distribution, whose cumulative distribution function is the logistic function;

  • the log-normal distribution, which describing variables that are the product of many small independent positive variables;

  • etc.


Wikipedia, List of probability distributions,” 2021.
R. E. Walpole, R. H. Myers, S. L. Myers, and K. Ye, Probability and Statistics for Engineers and Scientists, 8th ed. Pearson Education, 2007.