3.2 Discrete Distributions

The principles of probability theory introduced in the previous section are simple, and they are always valid. In this section and the next, we will see how some of the associated computations can be made easier with the use of distributions.

3.2.1 Random Variables and Distributions

Recall that, for any random “experiment”, the set of all possible outcomes is denoted by \({\cal S}\). A random variable (r.v.) is a function \(X:\mathcal{S}\to \mathbb{R}\), which is to say, it is a rule that associates a (real) number to every outcome of the experiment; \({\cal S}\) is the domain of the r.v. \(X\) and \(X(\mathcal{S})\subseteq \mathbb{R}\) is its range.

A probability distribution function (p.d.f.) is a function \(f:\mathbb{R}\to \mathbb{R}\) which specifies the probabilities of the values in the range \(X(\mathcal{S})\).

When \(\mathcal{S}\) is discrete,26 we say that \(X\) is a discrete r.v. and the p.d.f. is called a probability mass function (p.m.f.).


Throughout, we use the following notation:

  • capital roman letters (\(X\), \(Y\), etc.) denote r.v., and

  • corresponding lower case roman letters (\(x\), \(y\), etc.) denote generic values taken by the r.v.

A discrete r.v.can be used to define events – if \(X\) takes values \(X(\mathcal{S})=\{x_i\}\), then we can define the events \[A_i=\left\{s\in \mathcal{S}: X(s)=x_i \right\}:\]

  • the p.m.f. of \(X\) is \[f(x)=P\left( \left\{s\in \mathcal{S}: X(s)=x \right\} \right):=P(X=x);\]

  • its cumulative distribution function (c.d.f.) is \[F(x)=P(X\leq x).\]


If \(X\) is a discrete random variable with p.m.f. \(f(x)\) and c.d.f. \(F(x)\), then

  • \(0<f(x)\leq 1\) for all \(x\in X(\mathcal{S})\);

  • \(\sum_{s\in \mathcal{S}}f(X(s))=\sum_{x\in X(\mathcal{S})}f(x)=1\);

  • for any event \(A\subseteq \mathcal{S}\), \(P(X\in A)=\sum_{x\in A}f(x)\);

  • for any \(a,b\in \mathbb{R}\), \[\begin{aligned} P(a<X)&=1-P(X\leq a)=1-F(a) \\ P(X<b)&=P(X\leq b)-P(X=b)=F(b)-f(b)\end{aligned}\]

  • for any \(a,b\in \mathbb{R}\), \[\begin{aligned} P(a\leq X)&=1-P(X<a)\\&=1-(P(X\leq a)-P(X=a)) \\ &=1-F(a)+f(a) \end{aligned}\]

We can use these results to compute the probability of a discrete r.v. \(X\) falling in various intervals: \[\begin{aligned} P(a<X\leq b)&=P(X\leq b)-P(X\leq a)\\&=F(b)-F(a) \\ P(a\leq X\leq b)&=P(a<X\leq b)+P(X=a)\\&=F(b)-F(a)+f(a) \\ P(a<X< b)&=P(a<X\leq b)-P(X=b)\\&=F(b)-F(a)-f(b) \\ P(a\leq X<b)&=P(a\leq X\leq b)-P(X=b)\\&=F(b)-F(a)+f(a)-f(b) \end{aligned}\]


  • Flip a fair coin – the outcome space is \(\mathcal{S}=\{\text{Head}, \text{Tail}\}\). Let \(X:S\to\mathbb{R}\) be defined by \(X(\text{Head})=1\) and \(X(\text{Tail})=0\). Then \(X\) is a discrete random variable (as a convenience, we write \(X=1\) and \(X=0\)).

    If the coin is fair, the p.m.f. of \(X\) is \(f:\mathbb{R}\to \mathbb{R}\), where \[\begin{aligned} f(0)&=P(X=0)=1/2,\ f(1)=P(X=1)=1/2,\\ f(x)&=0 \text{ for all other $x$}.\end{aligned}\]

  • Roll a fair die – the outcome space is \(\mathcal{S}=\{1,\ldots, 6\}\). Let \(X:\mathcal{S}\to\mathbb{R}\) be defined by \(X(i)=i\) for \(i=1,\ldots, 6\). Then \(X\) is a discrete r.v.

    If the die is fair, the p.m.f. of \(X\) is \(f:\mathbb{R}\to \mathbb{R}\), where \[\begin{aligned} f(i)&=P(X=i)=1/6, \ \text{for }i=1,\ldots, 6, \\ f(x)&=0 \text{ for all other $x$}.\end{aligned}\]

  • For the random variable \(X\) from the previous example, the c.d.f. is \(F:\mathbb{R}\to\mathbb{R}\), where \[\begin{aligned} F(x)&=P(X\leq x)= \begin{cases} 0 & \text{if $x<1$} \\ i/6 & \text{if $i\leq x<i+1$, $i=1,\ldots, 6$} \\ 1 & \text{if $x\geq 6$} \end{cases}\end{aligned}\]

  • For the same random variable, we can compute the probability \(P(3\le X\le 5)\) directly: \[\begin{aligned} P(3\leq X\leq 5)&=P(X=3)+P(X=4)+P(X=5)\\&=\textstyle{\frac{1}{6}+\frac{1}{6}+\frac{1}{6}=\frac{1}{2}},\end{aligned}\] or we can use the c.d.f.: \[\textstyle{P(3\leq X\leq 5)=F(5)-F(3)+f(3)=\frac{5}{6}-\frac{3}{6}+\frac{1}{6}=\frac{1}{2}.}\]

  • The number of calls received over a specific time period, \(X\), is a discrete random variable, with potential values \(0,1,2,\ldots\).

  • Consider a \(5-\)card poker hand consisting of cards selected at random from a \(52-\)card deck. Find the probability distribution of \(X\), where \(X\) indicates the number of red cards (\(\diamondsuit\) and \(\heartsuit\)) in the hand.

    Answer: in all there are \(\binom{52}{5}\) ways to select a \(5-\)card poker hand from a \(52-\)card deck. By construction, \(X\) can take on values \(x=0,1,2,3,4,5\).

    If \(X=0\), then none of the \(5\) cards in the hands are \(\diamondsuit\) or \(\heartsuit\), and all of the \(5\) cards in the hands are \(\spadesuit\) or \(\clubsuit\). There are thus \(\binom{26}{0}\cdot \binom{26}{5}\) \(5-\)card hands that only contain black cards, and \[P(X=0)=\frac{\binom{26}{0} \cdot \binom {26}{5}}{\binom {52}{5}}.\] In general, if \(X=x\), \(x=0,1,2,3,4,5\), there are \(\binom{26}{x}\) ways of having \(x\) \(\diamondsuit\) or \(\heartsuit\) in the hand, and \(\binom{26}{5-x}\) ways of having \(5-x\) \(\spadesuit\) and \(\clubsuit\) in the hand, so that \[\begin{aligned} f(x)&=P(X=x)=\begin{cases}\frac{\binom{26}{x}\cdot \binom {26}{5-x}}{\binom{52}{5}},\ x=0,1,2,3,4,5; \\ 0 \text{ otherwise}\end{cases} \end{aligned}\]

  • Find the c.d.f.of a discrete random variable \(X\) with p.m.f. \(f(x)=0.1x\) if \(x=1,2,3,4\) and \(f(x)=0\) otherwise.

    Answer: \(f(x)\) is indeed a p.m.f. as \(0<f(x)\leq 1\) for all \(x\) and \[\sum_{x=1}^40.1x=0.1(1+2+3+4)=0.1\frac{4(5)}{2}=1.\] Computing \(F(x)=P(X\leq x)\) yields \[F(x)=\begin{cases} 0 & \text{if $x<1$} \\ 0.1 & \text{if $1\leq x<2$} \\ 0.3 & \text{if $2\leq x<3$} \\ 0.6 & \text{if $3\leq x<4$} \\ 1 & \text{if $x\geq 4$} \end{cases}\]

    The p.m.f. and the c.m.f. for this r.v. are shown in Figure 3.5.

P.m.f. and c.m.f. for the r.v. $X$ defined above.

Figure 3.5: P.m.f. and c.m.f. for the r.v. \(X\) defined above.

3.2.2 Expectation of a Discrete Random Variable

The expectation of a discrete random variable \(X\) is \[{\text{E} [X]}=\sum_x x\cdot P(X=x)=\sum_{x}xf(x)\,,\] where the sum extends over all values of \(x\) taken by \(X\).

The definition can be extended to a general function of \(X\): \[\text{E}[u(X)]=\sum_{x}u(x)P(X=x)=\sum_xu(x)f(x).\] As an important example, note that \[\text{E}[X^2]=\sum_x x^2P(X=x)=\sum_xx^2f(x).\]


  • What is the expectation on the roll \(Z\) of \(6-\)sided die?

    Answer: if the die is fair, then \[\begin{aligned} \text{E} [Z]&=\sum_{z=1}^6z\cdot P(Z=z) =\frac{1}{6}\sum_{z=1}^6z\\&=\frac{1}{6}\cdot\frac{6(7)}{2}=3.5.\end{aligned}\]

  • For each \(1\$\) bet in a gambling game, a player can win \(3\$\) with probability \(\frac{1}{3}\) and lose \(1\$\) with probability \(\frac{2}{3}\). Let \(X\) be the net gain/loss from the game. Find the expected value of the game.

    Answer: \(X\) can take on the value \(2\$\) for a win and \(-2\$\) for a loss (outcome \(-\) bet). The expected value of \(X\) is thus \[\text{E}[X]=2\cdot\frac{1}{3}+(-2)\cdot\frac{2}{3}=-\frac{2}{3}.\]

  • If \(Z\) is the number showing on a roll of a fair \(6-\)sided die, find \(\text{E} [Z^2]\) and \(\text{E} [(Z-3.5)^2]\).

    Answer: \[\begin{aligned} \text{E}[Z^2]&= \sum_z z^2P(Z=z) = \frac{1}{6}\sum_{z=1}^6z^2\\ &=\frac16(1^2+\cdots+6^2)=\frac{91}{6}\\ \text{E}[(Z&-3.5)^2]=\sum_{z=1}^6(z-3.5)^2P(Z=z)\\&=\frac{1}{6}\sum_{z=1}^6(z-3.5)^2 \\ &=\frac{(1-3.5)^2+\cdots+(6-3.5)^2}{6}=\frac{35}{12}.\end{aligned}\]

The expectation of a random variable is simply the average value that it takes, over all possible values.

Mean and Variance

We can interpret the expectation as the average or the mean of \(X\), which we often denote by \(\mu=\mu_X\). For instance, in the example of the fair die, \[\mu_Z=\text{E}[Z]=3.5\]

Note that in the final example, we could have written \[\text{E}[ (Z-3.5)^2 ]=\text{E}[ (Z-\text{E}[Z])^2 ].\] This is an important quantity associated to a random variable \(X\), its variance \(\text{Var}[X]\).

The variance of a discrete random variable \(X\) is the expected squared difference from the mean: \[\begin{aligned} \text{Var} (X)&= \text{E} [ (X-\mu_X)^2]= \sum_{x} (x-\mu_X)^2P(X=x)\\&=\sum_{x}\left(x^2-2x\mu_X+\mu_X^2\right)f(x) \\&= \sum_{x}x^2f(x)-2\mu_X\sum_{x}xf(x)+\mu_X^2\sum_{x}f(x)\\&= \text{E}[X^2]-2\mu_X\mu_X+\mu_X^2\cdot 1 \\ &=\text{E}[X^2]-\mu_X^2.\end{aligned}\] This is also sometimes written as \(\text{Var}[X]=\text{E}[X^2]-\text{E}^2[X]\).

Standard Deviation

The standard deviation of a discrete random variable \(X\) is defined directly from the variance: \[\text{SD}[X]=\sqrt{\text{Var} [X]}\,.\] The mean is a measure of centrality and it gives an idea as to where the bulk of a distribution is located; the variance and standard deviation provide information about the spread – distributions with higher variance/SD are more spread out about the average.

Example: let \(X\) and \(Y\) be random variables with the following p.d.f.

\(x\) \(P(X=x)\) \(y\) \(P(Y=y)\)
\(-2\) \(1/5\) \(-4\) \(1/5\)
\(-1\) \(1/5\) \(-2\) \(1/5\)
\(0\) \(1/5\) \(0\) \(1/5\)
\(1\) \(1/5\) \(2\) \(1/5\)
\(2\) \(1/5\) \(4\) \(1/5\)

Compute the expected values and compare the variances.

Answer: we have \(\text{E} [X]=\text{E} [Y]=0\) and \[2=\text{Var}[X]<\text{Var}[Y]=8,\] meaning that we would expect both distributions to be centered at \(0\), but \(Y\) should be more spread-out than \(X\) (because its variance is greater, see Figure 3.6).

R.v. $X$ and $Y$ defined above.R.v. $X$ and $Y$ defined above.

Figure 3.6: R.v. \(X\) (left) and \(Y\) (right), defined above.


Let \(X,Y\) be random variables and \(a\in \mathbb{R}\). Then

  • \(\text{E} [aX]=a\text{E}[X]\);

  • \(\text{E} [X+a]= \text{E}[X]+a\);

  • \(\text{E} [X+Y]=\text{E}[X]+\text{E}[Y]\);

  • in general, \(\text{E} [XY]\neq \text{E}[X]\text{E}[Y]\);

  • \(\text{Var}[aX]=a^2\text{Var}[X]\), \(\text{SD}[aX]=|a|\text{SD}[X]\);

  • \(\text{Var}[X+a]=\text{Var} [X]\), \(\text{SD}[X+a]=\text{SD} [X]\).

3.2.3 Binomial Distributions

Recall that the number of unordered samples of size \(r\) from a set of size \(n\) is \[_nC_r=\binom{n}{r}=\frac{n!}{(n-r)!r!}.\]


  • \(2!\times 4!=(1\times 2)\times (1\times 2\times 3\times 4)=48\), but \((2\times 4)!=8!=40320\).

  • \(\binom 5 1=\frac{5!}{1!\times 4!}=\frac{1\times 2\times 3\times 4\times 5}{1\times (1\times 2\times 3\times 4)}=\frac{5}{1}=5\).

  • In general: \(\binom n 1=n\) and\(\binom n 0=1\).

  • \(\binom 6 2=\frac{6!}{2!\times 4!}=\frac{4!\times 5\times 6}{2!\times 4!}=\frac{5\times 6}{2}=15\).

  • \(\binom {27} {22}=\frac{27!}{22!\times 5!}=\frac{22!\times 23\times 24\times 25\times 26\times 27}{5!\times 22!}=\frac{23\times 24\times 25\times 26\times 27}{120}\).

Binomial Experiments

A Bernoulli trial is a random experiment with two possible outcomes, “success" and”failure". Let \(p\) denote the probability of a success.

A binomial experiment consists of \(n\) repeated independent Bernoulli trials, each with the same probability of success, \(p\), such as:

  • female/male births (perahps not truly independent, but often treated as such);

  • satisfactory/defective items on a production line;

  • sampling with replacement with two types of item,

  • etc.

Probability Mass Function

In a binomial experiment of \(n\) independent events, each with probability of success \(p\), the number of successes \(X\) is a discrete random variable that follows a binomial distribution with parameters \((n,p)\): \[f(x)=P(X=x)=\binom nx p^x(1-p)^{n-x}\,,\ \text{ for $x=0,1,2,\ldots,n$.}\] This is often abbreviated to “\(X\sim\mathcal{B}(n,p)\)”.

If \(X\sim \mathcal{B}(1,p)\), then \(P(X=0)=1-p\) and \(P(X=1)=p\), so \[\text{E} [X]=(1-p)\cdot0 + p\cdot1=p\,.\]

Expectation and Variance

If \(X\sim \mathcal{B}(n,p)\), it can be shown that \[\text{E} [X]= \sum_{x=0}^n xP(X=x) =np,\] and \[\text{Var}[X]= \text{E}\left[(X-np)^2 \right] = \sum_{x=0}^n (x-np)^2 P(X=x)=np(1-p)\] (we will eventually see an easier way to derive these formulas by interpreting \(X\) as a sum of other discrete random variables).

Recognizing that certain situations can be modeled via a distribution whose p.m.f.and c.d.f.are already known can simplify eventual computations.


  • Suppose that water samples taken in some well-defined region have a \(10\%\) probability of being polluted. If \(12\) samples are selected independently, then it is reasonable to model the number \(X\) of polluted samples as \(\mathcal{B}(12,0.1)\).


    1. \(\text{E} [X]\) and \(\text{Var}[X]\);

    2. \(P(X=3)\);

    3. \(P(X\leq 3)\).


    1. If \(X\sim\mathcal{B}(n,p)\), then \[\text{E} [X]=np\quad\text{and}\quad \text{Var}[X]=np(1-p).\] With \(n=12\) and \(p=0.1\), we obtain \[\begin{aligned} \text{E} [X]&= 12\times0.1=1.2;\\ \text{Var}[X]&=12\times0.1\times0.9=1.08\,.\end{aligned}\]

    2. By definition, \[P(X=3)=\binom{12}3(0.1)^3(0.9)^{9}\approx0.0852.\]

    3. By definition, \[\begin{aligned} P(X\leq 3)&=\sum_{x=0}^3P(X=x) \\&=\sum_{x=0}^3\binom{12}{x}(0.1)^x(0.9)^{12-x}. \end{aligned}\] This sum can be computed directly, however, for \(X\sim \mathcal{B}(12,0.1)\), \(P(X\leq 3)\) can also be read directly from tabulated values (as in Figure 3.7):

      Tabulated c.d.f. values for the binomial distributio with $n=12$.

      Figure 3.7: Tabulated c.d.f. values for the binomial distribution with \(n=12\) [source unknown].

      The appropriate value \(\approx 0.9744\) can be found in the group corresponding to \(n=12\), in the row corresponding to \(x=3\), and in the column corresponding to \(p=0.1\).

      The table can also be used to compute \[\begin{aligned} P(X=3)&=P(X\leq 3)-P(X\leq 2)\\&=0.9744-0.8891\approx 0.0853.\end{aligned}\]

  • An airline sells \(101\) tickets for a flight with \(100\) seats. Each passenger with a ticket is known to have a probability \(p=0.97\) of showing up for their flight. What is the probability of \(101\) passengers showing up (and the airline being caught overbooking)? Make appropriate assumptions. What if the airline sells 125 tickets?

    Answer: let \(X\) be the number of passengers that show up. We want to compute \(P(X>100)\).

    If all passengers show up independently of one another (no families or late bus?), we can model \(X\sim \mathcal{B}(101,0.97)\) and \[\begin{aligned} P(X&>100)=P(X=101)\\&=\binom{101}{101}(0.97)^{101}(0.03)^0\approx 0.046.\end{aligned}\] If the airline sells \(n=125\) tickets, we can model the situation with the binomial distribution \(\mathcal{B}(125,0.97)\), so that \[\begin{aligned} P(X&>100)=1-P(X\leq 100)\\&=1-\sum_{x=0}^{100}\binom{125}{x}(0.97)^x(0.03)^{125-x}.\end{aligned}\] This sum is harder to compute directly, but is very nearly \(1\) (try it in R, say).

    Do these results match your intuition?

We can evaluate related probabilities in R via the base functions rbinom(), dbinom(), etc., whose parameters are n, size, and prob.

We can draw an observation \(X\) from a binomial distribution \(\mathcal{B}(11,0.2)\) in R as follows:

rbinom(1, size=11, prob=0.2)
[1] 5

We could also replicate the process 1000 times (and extract the empirical expectation and variance):

v<- rbinom(1000,size=11, prob=0.2)
[1] 2.236
[1] 1.794098

The histogram of the sample is shown below.

brks = min(v):max(v) 
hist(v, breaks = brks)

If we change the parameters of the distribution \((\mathcal{B}(19.0.7))\), we get a different looking histogram (and a different expectation and variance).

v<- rbinom(1000,size=19, prob=0.7)
[1] 13.308
[1] 4.253389
brks = min(v):max(v) 
hist(v, breaks = brks)

3.2.4 Geometric Distributions

Now consider a sequence of Bernoulli trials, with probability \(p\) of success at each step. Let the geometric random variable \(X\) denote the number of steps before the first success occurs. The probability mass function is given by \[f(x)=P(X=x)=(1-p)^{x-1}p,\quad x=1,2,\ldots \] denoted \(X\sim \text{Geo}(p)\). For this random variable, we have \[\text{E}[X]=\frac{1}{p} \quad\mbox{and}\quad \text{Var}[X]=\frac{1-p}{p^2}.\]


  • A fair \(6-\)sided die is thrown until it shows a \(6\). What is the probability that \(5\) throws are required?

    Answer: If \(5\) throws are required, we have to compute \(P(X=5)\), where \(X\) is geometric \(\text{Geo}(1/6)\): \[P(X=5)=(1-p)^{5-1}p=(5/6)^4(1/6)\approx 0.0804.\]

  • In the example above, how many throws would you expect to need?

    Answer: \(\text{E}[X]=\frac{1}{1/6}=6\).

3.2.5 Negative Binomial Distributions

Consider now a sequence of Bernoulli trials, with probability \(p\) of success at each step. Let the negative binomial random variable \(X\) denote the number of steps before the \(r\)th success occurs.

The probability mass function is given by \[f(x)=P(X=x)=\binom{x-1}{r-1}(1-p)^{x-r}p^r,\quad x=r,r+1,\ldots\] which we denote by \(X\sim \text{NegBin}(p,r)\).

For this random variable, we have \[\text{E}[X]=\frac{r}{p} \quad\mbox{and}\quad \text{Var}[X]=\frac{r(1-p)}{p^2}.\]


  • A fair \(6-\)sided die is thrown until it three \(6\)’s are rolled. What is the probability that \(5\) throws are required?

    Answer: If \(5\) throws are required, we have to compute \(P(X=5)\), where \(X\) is geometric \(\text{NegBin}(1/6,3)\): \[\begin{aligned} P(X=5)&=\binom{5-1}{3-1}(1-p)^{5-3}p^3\\&=\binom{4}{2}(5/6)^2(1/6)^3\approx 0.0193. \end{aligned}\]

  • In the example above, how many throws would you expect to need?

    Answer: \(\text{E}[X]=\frac{3}{1/6}=18\).

3.2.6 Poisson Distributions

Let us say we are counting the number of “changes” that occur in a continuous interval of time or space.27

We have a Poisson process with rate \(\lambda\), denoted by \(\mathcal{P}(\lambda)\), if:

  1. the number of changes occurring in non-overlapping intervals are independent;

  2. the probability of exactly one change in a short interval of length \(h\) is approximately \(\lambda h\), and

  3. The probability of \(2+\) changes in a sufficiently short interval is essentially \(0\).

Assume that an experiment satisfies the above properties. Let \(X\) be the number of changes in a unit interval (this could be \(1\) day, or \(15\) minutes, or \(10\) years, etc.).

What is \(P(X=x)\), for \(x=0, 1, \ldots\)? We can get to the answer by first partition the unit interval into \(n\) disjoint sub-intervals of length \(1/n\). Then,

  1. by the second condition, the probability of one change occurring in one of the sub-intervals is approximately \(\lambda/n\);

  2. by the third condition, the probability of \(2+\) changes is \(\approx 0\), and

  3. by the first condition, we have a sequence of \(n\) Bernoulli trials with probability \(p=\lambda/n\).

Therefore, \[\begin{aligned} f(x)&=P(X=x) \approx \frac{n!}{x!(n-x)!}\left(\frac{\lambda}{n}\right)^x\left(1-\frac{\lambda}{n}\right)^{n-x} \\ &=\frac{\lambda^x}{x!}\cdot\underbrace{\frac{n!}{(n-x)!}\cdot\frac{1}{n^x}}_{\text{term $1$}}\cdot\underbrace{\left(1-\frac{\lambda}{n}\right)^{n}}_{\text{term $2$}}\cdot\underbrace{\left(1-\frac{\lambda}{n}\right)^{-x}}_{\text{term $3$}}.\end{aligned}\] Letting \(n\to\infty\), we obtain \[\begin{aligned} P(X=x)&=\lim_{n\to\infty}\frac{\lambda^x}{x!}\cdot\underbrace{\frac{n!}{(n-x)!}\cdot\frac{1}{n^x}}_{\text{term $1$}}\cdot\underbrace{\left(1-\frac{\lambda}{n}\right)^{n}}_{\text{term $2$}}\cdot\underbrace{\left(1-\frac{\lambda}{n}\right)^{-x}}_{\text{term $3$}} \\ &=\frac{\lambda^x}{x!}\cdot 1 \cdot \exp(-\lambda)\cdot 1 = \frac{\lambda^xe^{-\lambda}}{x!}, \quad x=0, 1, \ldots \end{aligned}\] Let \(X\sim \mathcal{P}(\lambda)\). Then it can be shown that \[\text{E} [X]=\lambda \quad\mbox{and}\quad \text{Var} [X]=\lambda,\] that is, the mean and the variance of a Poisson random variable are identical.

We can compute related probabilities in R via the base functions rpois(), dpois(), etc., with required parameters n and lambda.

We can draw a sample of size 1 from \(\mathcal{P}(13)\) in R as follows:

[1] 18

We sample independently 500 times; this yields an empirical expectation and variance.

[1] 13 12 14 12 18  9
[1] 12.874
[1] 12.92798

The sample’s histogram is shown below.



  • A traffic flow is typically modeled by a Poisson distribution. It is known that the traffic flowing through an intersection is \(6\) cars/minute, on average. What is the probability of no cars entering the intersection in a \(30\) second period?

    Answer: \(6\) cars/min \(=\) \(3\) cars/\(30\) sec. Thus \(\lambda=3\), and we need to compute \[P(X=0)=\frac{3^0e^{-3}}{0!}=\frac{e^{-3}}{1}\approx 0.0498.\]

  • A hospital needs to schedule night shifts in the maternity ward. It is known that there are \(3000\) deliveries per year; if these happened randomly round the clock,[Is this a reasonable assumption?], we would expect \(1000\) deliveries between the hours of midnight and 8.00 a.m., a time when much of the staff is off-duty.

    It is thus important to ensure that the night shift is sufficiently staffed to allow the maternity ward to cope with the workload on any particular night, or at least, on a high proportion of nights.

    The average number of deliveries per night \[\lambda = 1000/365.25\approx 2.74.\] If the daily number \(X\) of night deliveries follows a Poisson process \(\mathcal{P}(\lambda)\), we can compute the probability of delivering \(x=0,1,2,\ldots\) babies on each night.

    For a Poisson distribution, the probability mass values \(f(x)\) can be obtained using dpois() (for a general distribution, replace the r in the rxxxxx(...) random number generators by d: dxxxxx(...)).

    We setup the Poisson distribution parameters and the distribution’s range (in theory, it goes to infinity, but we have got to stop somewhere in practice).

    lambda = 2.74 

    The probablity mass and cumulative distribution functions are shown below:

    x pmf cdf
    0 0.0645703 0.0645703
    1 0.1769228 0.2414931
    2 0.2423842 0.4838773
    3 0.2213775 0.7052548
    4 0.1516436 0.8568984
    5 0.0831007 0.9399991
    6 0.0379493 0.9779484
    7 0.0148544 0.9928029
    8 0.0050876 0.9978905
    9 0.0015489 0.9994394
    10 0.0004244 0.9998638

    Here is the p.m.f. plot:

    plot(x,pmf, type="h", col=2, main="Poisson PMF", 
         xlab="x", ylab="f(x)=P(X=x)")
    points(x,pmf, col=2)
    abline(h=0, col=4)

    and the c.d.f. plot:

    plot(c(1,x),c(0,cdf), type="s", col=2, main="Poisson CDF", 
         xlab="x", ylab="F(x)=P(X<=x)")
    abline(h=0:1, col=4)

  • If the maternity ward wants to prepare for the greatest possible traffic on \(80\%\) of the nights, how many deliveries should be expected?

    Answer: we seek an \(x\) for which \[P(X\leq x-1)\leq 0.80\leq P(X\leq x).\]

    Let’s plot the height \(F(x)=0.8\) on the c.d.f.:

    plot(c(1,x),c(0,cdf), type="s", col=2, main="Poisson CDF", 
         xlab="x", ylab="F(x)=P(X<=x)")
    abline(h=0:1, col=4)
    abline(h=0.8, col=1)

    The \(y=0.8\) line crosses the CMF at \(x=4\); let’s evaluate \(F(3)=P(X\leq 3)\) and \(F(4)=P(X\leq 4)\) to confirm that \(F(3)\leq 0.8 \leq F(4)\).

    [1] 0.7052548
    [1] 0.8568984

    It is indeed the case. Thus, if the hospital prepares for \(4\) deliveries a night, they will be ready for the worst on at least \(80\%\) of the nights (closer to \(85.7\%\), actually).

    Note that this is different than asking how many deliveries are expected nightly (namely, \(\text{E}[X]=2.74\)).

  • On how many nights in the year would \(5\) or more deliveries be expected?

    Answer: we need to evaluate \[\begin{aligned} 365.25\cdot P(X\ge 5)&=365.25 (1-P(X\le 4)). \end{aligned}\]

    [1] 52.26785

    Thus, roughly \(14\%\) of the nights.

  • Over the course of one year, what is the greatest number of deliveries expected on any night?

    Answer: we are looking for largest value of \(x\) for which \(365.25\cdot P(X=x)\geq 1\) (if \(365.25\cdot P(X=x)<1\), then the probability of that number of deliveries is too low to expect that we would ever see it during the year).

    The expected number of nights with each number of deliveries can be computed using:

    for(j in 0:10){
               [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
            0.00000  1.00000  2.00000  3.00000  4.00000  5.00000  6.00000 7.000000
    nights 23.58432 64.62103 88.53082 80.85815 55.38783 30.35253 13.86099 5.425587
               [,9]    [,10]      [,11]
           8.000000 9.000000 10.0000000
    nights 1.858264 0.565738  0.1550122

    The largest index is:

    [1] 8

    Namely, \(x=8\). Indeed, for larger values of \(x\), \(365.25\cdot P(X=x)<1\).

    [1] 1.858264
    [1] 0.565738

3.2.7 Other Discrete Distributions

Wikipedia [30] lists other common discrete distributions:

  • the Rademacher distribution, which takes values \(1\) and \(-1\), each with probability \(1/2\);

  • the beta binomial distribution, which describes the number of successes in a series of independent Bernoulli experiments with heterogeneity in the success probability;

  • the discrete uniform distribution, where all elements of a finite set are equally likely (balanced coin, unbiased die, first card of a well-shuffled deck, etc.);

  • the hypergeometric distribution, which describes the number of successes in the first \(m\) of a series of \(n\) consecutive Bernoulli experiments, if the total number of successes is known;

  • the negative hypergeometric distribution, which describes the number of attempts needed to get the \(n\)th success in a series of Bernoulli experiments;

  • the Poisson binomial distribution, which describes the number of successes in a series of independent Bernoulli experiments with different success probabilities;

  • Benford’s Law, which describes the frequency of the first digit of many naturally occurring data.

  • Zipf’s Law, which describes the frequency of words in the English language;

  • the beta negative binomial distribution, which describes the number of failures needed to obtain \(r\) successes in a sequence of independent Bernoulli experiments;

  • etc.


Wikipedia, List of probability distributions,” 2021.