## 3.2 Discrete Distributions

The principles of probability theory introduced in the previous section are simple, and they are always valid. In this section and the next, we will see how some of the associated computations can be made easier with the use of distributions.

### 3.2.1 Random Variables and Distributions

Recall that, for any random “experiment”, the set of all possible outcomes is denoted by \({\cal S}\). A **random variable** (r.v.) is a function \(X:\mathcal{S}\to \mathbb{R}\), which is to say, it is a rule that associates a (real) number to every outcome of the experiment; \({\cal S}\) is the **domain** of the r.v. \(X\) and \(X(\mathcal{S})\subseteq \mathbb{R}\) is its **range**.

A **probability distribution function** (p.d.f.) is a function
\(f:\mathbb{R}\to \mathbb{R}\) which specifies the probabilities of the values in the
range \(X(\mathcal{S})\).

When \(\mathcal{S}\) is **discrete**,^{26} we say that \(X\) is a **discrete
r.v.** and the p.d.f. is called a **probability mass function**
(p.m.f.).

#### Notation

Throughout, we use the following notation:

capital roman letters (\(X\), \(Y\), etc.) denote r.v., and

corresponding lower case roman letters (\(x\), \(y\), etc.) denote

*generic values taken*by the r.v.

A discrete r.v.can be used to **define events** – if \(X\) takes values
\(X(\mathcal{S})=\{x_i\}\), then we can define the events
\[A_i=\left\{s\in \mathcal{S}: X(s)=x_i \right\}:\]

the p.m.f. of \(X\) is \[f(x)=P\left( \left\{s\in \mathcal{S}: X(s)=x \right\} \right):=P(X=x);\]

its

**cumulative distribution function**(c.d.f.) is \[F(x)=P(X\leq x).\]

#### Properties

If \(X\) is a discrete random variable with p.m.f. \(f(x)\) and c.d.f. \(F(x)\), then

\(0<f(x)\leq 1\) for all \(x\in X(\mathcal{S})\);

\(\sum_{s\in \mathcal{S}}f(X(s))=\sum_{x\in X(\mathcal{S})}f(x)=1\);

for any event \(A\subseteq \mathcal{S}\), \(P(X\in A)=\sum_{x\in A}f(x)\);

for any \(a,b\in \mathbb{R}\), \[\begin{aligned} P(a<X)&=1-P(X\leq a)=1-F(a) \\ P(X<b)&=P(X\leq b)-P(X=b)=F(b)-f(b)\end{aligned}\]

for any \(a,b\in \mathbb{R}\), \[\begin{aligned} P(a\leq X)&=1-P(X<a)\\&=1-(P(X\leq a)-P(X=a)) \\ &=1-F(a)+f(a) \end{aligned}\]

We can use these results to compute the probability of a **discrete**
r.v. \(X\) falling in various intervals: \[\begin{aligned}
P(a<X\leq b)&=P(X\leq b)-P(X\leq a)\\&=F(b)-F(a) \\
P(a\leq X\leq b)&=P(a<X\leq b)+P(X=a)\\&=F(b)-F(a)+f(a) \\
P(a<X< b)&=P(a<X\leq b)-P(X=b)\\&=F(b)-F(a)-f(b) \\
P(a\leq X<b)&=P(a\leq X\leq b)-P(X=b)\\&=F(b)-F(a)+f(a)-f(b) \end{aligned}\]

**Examples:**

Flip a fair coin – the outcome space is \(\mathcal{S}=\{\text{Head}, \text{Tail}\}\). Let \(X:S\to\mathbb{R}\) be defined by \(X(\text{Head})=1\) and \(X(\text{Tail})=0\). Then \(X\) is a discrete random variable (as a convenience, we write \(X=1\) and \(X=0\)).

If the coin is fair, the p.m.f. of \(X\) is \(f:\mathbb{R}\to \mathbb{R}\), where \[\begin{aligned} f(0)&=P(X=0)=1/2,\ f(1)=P(X=1)=1/2,\\ f(x)&=0 \text{ for all other $x$}.\end{aligned}\]

Roll a fair die – the outcome space is \(\mathcal{S}=\{1,\ldots, 6\}\). Let \(X:\mathcal{S}\to\mathbb{R}\) be defined by \(X(i)=i\) for \(i=1,\ldots, 6\). Then \(X\) is a discrete r.v.

If the die is fair, the p.m.f. of \(X\) is \(f:\mathbb{R}\to \mathbb{R}\), where \[\begin{aligned} f(i)&=P(X=i)=1/6, \ \text{for }i=1,\ldots, 6, \\ f(x)&=0 \text{ for all other $x$}.\end{aligned}\]

For the random variable \(X\) from the previous example, the c.d.f. is \(F:\mathbb{R}\to\mathbb{R}\), where \[\begin{aligned} F(x)&=P(X\leq x)= \begin{cases} 0 & \text{if $x<1$} \\ i/6 & \text{if $i\leq x<i+1$, $i=1,\ldots, 6$} \\ 1 & \text{if $x\geq 6$} \end{cases}\end{aligned}\]

For the same random variable, we can compute the probability \(P(3\le X\le 5)\) directly: \[\begin{aligned} P(3\leq X\leq 5)&=P(X=3)+P(X=4)+P(X=5)\\&=\textstyle{\frac{1}{6}+\frac{1}{6}+\frac{1}{6}=\frac{1}{2}},\end{aligned}\] or we can use the c.d.f.: \[\textstyle{P(3\leq X\leq 5)=F(5)-F(3)+f(3)=\frac{5}{6}-\frac{3}{6}+\frac{1}{6}=\frac{1}{2}.}\]

The number of calls received over a specific time period, \(X\), is a discrete random variable, with potential values \(0,1,2,\ldots\).

Consider a \(5-\)card poker hand consisting of cards selected at random from a \(52-\)card deck. Find the probability distribution of \(X\), where \(X\) indicates the number of red cards (\(\diamondsuit\) and \(\heartsuit\)) in the hand.

**Answer:**in all there are \(\binom{52}{5}\) ways to select a \(5-\)card poker hand from a \(52-\)card deck. By construction, \(X\) can take on values \(x=0,1,2,3,4,5\).If \(X=0\), then none of the \(5\) cards in the hands are \(\diamondsuit\) or \(\heartsuit\), and all of the \(5\) cards in the hands are \(\spadesuit\) or \(\clubsuit\). There are thus \(\binom{26}{0}\cdot \binom{26}{5}\) \(5-\)card hands that only contain black cards, and \[P(X=0)=\frac{\binom{26}{0} \cdot \binom {26}{5}}{\binom {52}{5}}.\] In general, if \(X=x\), \(x=0,1,2,3,4,5\), there are \(\binom{26}{x}\) ways of having \(x\) \(\diamondsuit\) or \(\heartsuit\) in the hand, and \(\binom{26}{5-x}\) ways of having \(5-x\) \(\spadesuit\) and \(\clubsuit\) in the hand, so that \[\begin{aligned} f(x)&=P(X=x)=\begin{cases}\frac{\binom{26}{x}\cdot \binom {26}{5-x}}{\binom{52}{5}},\ x=0,1,2,3,4,5; \\ 0 \text{ otherwise}\end{cases} \end{aligned}\]

Find the c.d.f.of a discrete random variable \(X\) with p.m.f. \(f(x)=0.1x\) if \(x=1,2,3,4\) and \(f(x)=0\) otherwise.

**Answer:**\(f(x)\) is indeed a p.m.f. as \(0<f(x)\leq 1\) for all \(x\) and \[\sum_{x=1}^40.1x=0.1(1+2+3+4)=0.1\frac{4(5)}{2}=1.\] Computing \(F(x)=P(X\leq x)\) yields \[F(x)=\begin{cases} 0 & \text{if $x<1$} \\ 0.1 & \text{if $1\leq x<2$} \\ 0.3 & \text{if $2\leq x<3$} \\ 0.6 & \text{if $3\leq x<4$} \\ 1 & \text{if $x\geq 4$} \end{cases}\]The p.m.f. and the c.m.f. for this r.v. are shown in Figure 3.5.

### 3.2.2 Expectation of a Discrete Random Variable

The **expectation** of a discrete random variable \(X\) is
\[{\text{E} [X]}=\sum_x x\cdot P(X=x)=\sum_{x}xf(x)\,,\] where the sum
extends over all values of \(x\) taken by \(X\).

The definition can be extended to a general function of \(X\): \[\text{E}[u(X)]=\sum_{x}u(x)P(X=x)=\sum_xu(x)f(x).\] As an important example, note that \[\text{E}[X^2]=\sum_x x^2P(X=x)=\sum_xx^2f(x).\]

**Examples:**

What is the expectation on the roll \(Z\) of \(6-\)sided die?

**Answer:**if the die is fair, then \[\begin{aligned} \text{E} [Z]&=\sum_{z=1}^6z\cdot P(Z=z) =\frac{1}{6}\sum_{z=1}^6z\\&=\frac{1}{6}\cdot\frac{6(7)}{2}=3.5.\end{aligned}\]For each \(1\$\) bet in a gambling game, a player can win \(3\$\) with probability \(\frac{1}{3}\) and lose \(1\$\) with probability \(\frac{2}{3}\). Let \(X\) be the net gain/loss from the game. Find the expected value of the game.

**Answer:**\(X\) can take on the value \(2\$\) for a win and \(-2\$\) for a loss (outcome \(-\) bet). The expected value of \(X\) is thus \[\text{E}[X]=2\cdot\frac{1}{3}+(-2)\cdot\frac{2}{3}=-\frac{2}{3}.\]If \(Z\) is the number showing on a roll of a fair \(6-\)sided die, find \(\text{E} [Z^2]\) and \(\text{E} [(Z-3.5)^2]\).

**Answer:**\[\begin{aligned} \text{E}[Z^2]&= \sum_z z^2P(Z=z) = \frac{1}{6}\sum_{z=1}^6z^2\\ &=\frac16(1^2+\cdots+6^2)=\frac{91}{6}\\ \text{E}[(Z&-3.5)^2]=\sum_{z=1}^6(z-3.5)^2P(Z=z)\\&=\frac{1}{6}\sum_{z=1}^6(z-3.5)^2 \\ &=\frac{(1-3.5)^2+\cdots+(6-3.5)^2}{6}=\frac{35}{12}.\end{aligned}\]

The expectation of a random variable is simply the average value that it takes, over all possible values.

#### Mean and Variance

We can interpret the expectation as the average or the **mean** of \(X\),
which we often denote by \(\mu=\mu_X\). For instance, in the example of
the fair die, \[\mu_Z=\text{E}[Z]=3.5\]

Note that in the final example, we could have written
\[\text{E}[ (Z-3.5)^2 ]=\text{E}[ (Z-\text{E}[Z])^2 ].\] This is an important quantity
associated to a random variable \(X\), its **variance** \(\text{Var}[X]\).

The variance of a discrete random variable \(X\) is the **expected squared
difference from the mean**: \[\begin{aligned}
\text{Var} (X)&= \text{E} [ (X-\mu_X)^2]= \sum_{x} (x-\mu_X)^2P(X=x)\\&=\sum_{x}\left(x^2-2x\mu_X+\mu_X^2\right)f(x) \\&= \sum_{x}x^2f(x)-2\mu_X\sum_{x}xf(x)+\mu_X^2\sum_{x}f(x)\\&= \text{E}[X^2]-2\mu_X\mu_X+\mu_X^2\cdot 1 \\ &=\text{E}[X^2]-\mu_X^2.\end{aligned}\]
This is also sometimes written as \(\text{Var}[X]=\text{E}[X^2]-\text{E}^2[X]\).

#### Standard Deviation

The **standard deviation** of a discrete random variable \(X\) is defined
directly from the variance: \[\text{SD}[X]=\sqrt{\text{Var} [X]}\,.\] The mean is a
measure of **centrality** and it gives an idea as to where the **bulk**
of a distribution is located; the variance and standard deviation
provide information about the **spread** – distributions with higher
variance/SD are **more spread out about the average**.

**Example:** let \(X\) and \(Y\) be random variables with the following p.d.f.

\(x\) | \(P(X=x)\) | \(y\) | \(P(Y=y)\) |
---|---|---|---|

\(-2\) | \(1/5\) | \(-4\) | \(1/5\) |

\(-1\) | \(1/5\) | \(-2\) | \(1/5\) |

\(0\) | \(1/5\) | \(0\) | \(1/5\) |

\(1\) | \(1/5\) | \(2\) | \(1/5\) |

\(2\) | \(1/5\) | \(4\) | \(1/5\) |

Compute the expected values and compare the variances.

**Answer:** we have \(\text{E} [X]=\text{E} [Y]=0\) and \[2=\text{Var}[X]<\text{Var}[Y]=8,\] meaning that we
would expect both distributions to be centered at \(0\), but \(Y\) should be
more spread-out than \(X\) (because its variance is greater, see Figure 3.6).

#### Properties

Let \(X,Y\) be random variables and \(a\in \mathbb{R}\). Then

\(\text{E} [aX]=a\text{E}[X]\);

\(\text{E} [X+a]= \text{E}[X]+a\);

\(\text{E} [X+Y]=\text{E}[X]+\text{E}[Y]\);

in general, \(\text{E} [XY]\neq \text{E}[X]\text{E}[Y]\);

\(\text{Var}[aX]=a^2\text{Var}[X]\), \(\text{SD}[aX]=|a|\text{SD}[X]\);

\(\text{Var}[X+a]=\text{Var} [X]\), \(\text{SD}[X+a]=\text{SD} [X]\).

### 3.2.3 Binomial Distributions

Recall that the number of unordered samples of size \(r\) from a set of size \(n\) is \[_nC_r=\binom{n}{r}=\frac{n!}{(n-r)!r!}.\]

**Examples**

\(2!\times 4!=(1\times 2)\times (1\times 2\times 3\times 4)=48\), but \((2\times 4)!=8!=40320\).

\(\binom 5 1=\frac{5!}{1!\times 4!}=\frac{1\times 2\times 3\times 4\times 5}{1\times (1\times 2\times 3\times 4)}=\frac{5}{1}=5\).

In general: \(\binom n 1=n\) and\(\binom n 0=1\).

\(\binom 6 2=\frac{6!}{2!\times 4!}=\frac{4!\times 5\times 6}{2!\times 4!}=\frac{5\times 6}{2}=15\).

\(\binom {27} {22}=\frac{27!}{22!\times 5!}=\frac{22!\times 23\times 24\times 25\times 26\times 27}{5!\times 22!}=\frac{23\times 24\times 25\times 26\times 27}{120}\).

#### Binomial Experiments

A **Bernoulli trial** is a random experiment with two possible outcomes,
“success" and”failure". Let \(p\) denote the probability of a success.

A **binomial experiment** consists of \(n\) repeated *independent*
Bernoulli trials, each with the same probability of success, \(p\), such as:

female/male births (perahps not truly independent, but often treated as such);

satisfactory/defective items on a production line;

sampling with replacement with two types of item,

etc.

#### Probability Mass Function

In a binomial experiment of \(n\) independent events, each with
probability of success \(p\), the number of successes \(X\) is a discrete
random variable that follows a **binomial distribution** with parameters
\((n,p)\): \[f(x)=P(X=x)=\binom nx p^x(1-p)^{n-x}\,,\ \text{ for
$x=0,1,2,\ldots,n$.}\] This is often abbreviated to “\(X\sim\mathcal{B}(n,p)\)”.

If \(X\sim \mathcal{B}(1,p)\), then \(P(X=0)=1-p\) and \(P(X=1)=p\), so \[\text{E} [X]=(1-p)\cdot0 + p\cdot1=p\,.\]

#### Expectation and Variance

If \(X\sim \mathcal{B}(n,p)\), it can be shown that \[\text{E} [X]= \sum_{x=0}^n xP(X=x) =np,\] and \[\text{Var}[X]= \text{E}\left[(X-np)^2 \right] = \sum_{x=0}^n (x-np)^2 P(X=x)=np(1-p)\] (we will eventually see an easier way to derive these formulas by interpreting \(X\) as a sum of other discrete random variables).

Recognizing that certain situations can be modeled *via* a distribution
whose p.m.f.and c.d.f.are already known can simplify eventual
computations.

**Examples:**

Suppose that water samples taken in some well-defined region have a \(10\%\) probability of being polluted. If \(12\) samples are selected independently, then it is reasonable to model the number \(X\) of polluted samples as \(\mathcal{B}(12,0.1)\).

Find

\(\text{E} [X]\) and \(\text{Var}[X]\);

\(P(X=3)\);

\(P(X\leq 3)\).

**Answer:**If \(X\sim\mathcal{B}(n,p)\), then \[\text{E} [X]=np\quad\text{and}\quad \text{Var}[X]=np(1-p).\] With \(n=12\) and \(p=0.1\), we obtain \[\begin{aligned} \text{E} [X]&= 12\times0.1=1.2;\\ \text{Var}[X]&=12\times0.1\times0.9=1.08\,.\end{aligned}\]

By definition, \[P(X=3)=\binom{12}3(0.1)^3(0.9)^{9}\approx0.0852.\]

By definition, \[\begin{aligned} P(X\leq 3)&=\sum_{x=0}^3P(X=x) \\&=\sum_{x=0}^3\binom{12}{x}(0.1)^x(0.9)^{12-x}. \end{aligned}\] This sum can be computed directly, however, for \(X\sim \mathcal{B}(12,0.1)\), \(P(X\leq 3)\) can also be read directly from tabulated values (as in Figure 3.7):

The appropriate value \(\approx 0.9744\) can be found in the group corresponding to \(n=12\), in the row corresponding to \(x=3\), and in the column corresponding to \(p=0.1\).

The table can also be used to compute \[\begin{aligned} P(X=3)&=P(X\leq 3)-P(X\leq 2)\\&=0.9744-0.8891\approx 0.0853.\end{aligned}\]

An airline sells \(101\) tickets for a flight with \(100\) seats. Each passenger with a ticket is known to have a probability \(p=0.97\) of showing up for their flight. What is the probability of \(101\) passengers showing up (and the airline being caught overbooking)? Make appropriate assumptions. What if the airline sells 125 tickets?

**Answer:**let \(X\) be the number of passengers that show up. We want to compute \(P(X>100)\).If all passengers show up independently of one another (no families or late bus?), we can model \(X\sim \mathcal{B}(101,0.97)\) and \[\begin{aligned} P(X&>100)=P(X=101)\\&=\binom{101}{101}(0.97)^{101}(0.03)^0\approx 0.046.\end{aligned}\] If the airline sells \(n=125\) tickets, we can model the situation with the binomial distribution \(\mathcal{B}(125,0.97)\), so that \[\begin{aligned} P(X&>100)=1-P(X\leq 100)\\&=1-\sum_{x=0}^{100}\binom{125}{x}(0.97)^x(0.03)^{125-x}.\end{aligned}\] This sum is harder to compute directly, but is very nearly \(1\) (try it in

`R`

, say).Do these results match your intuition?

We can evaluate related probabilities in `R`

*via* the base functions `rbinom()`

, `dbinom()`

, etc., whose parameters are `n`

, `size`

, and `prob`

.

We can draw an observation \(X\) from a binomial distribution \(\mathcal{B}(11,0.2)\) in `R`

as follows:

`[1] 5`

We could also replicate the process 1000 times (and extract the empirical expectation and variance):

```
[1] 2.236
[1] 1.794098
```

The histogram of the sample is shown below.

If we change the parameters of the distribution \((\mathcal{B}(19.0.7))\), we get a different looking histogram (and a different expectation and variance).

```
[1] 13.308
[1] 4.253389
```

### 3.2.4 Geometric Distributions

Now consider a sequence of Bernoulli trials, with probability \(p\) of
success at each step. Let the **geometric** random variable \(X\) denote
the number of steps before the first success occurs. The probability
mass function is given by
\[f(x)=P(X=x)=(1-p)^{x-1}p,\quad x=1,2,\ldots \] denoted
\(X\sim \text{Geo}(p)\). For this random variable, we have
\[\text{E}[X]=\frac{1}{p} \quad\mbox{and}\quad \text{Var}[X]=\frac{1-p}{p^2}.\]

**Examples:**

A fair \(6-\)sided die is thrown until it shows a \(6\). What is the probability that \(5\) throws are required?

**Answer:**If \(5\) throws are required, we have to compute \(P(X=5)\), where \(X\) is geometric \(\text{Geo}(1/6)\): \[P(X=5)=(1-p)^{5-1}p=(5/6)^4(1/6)\approx 0.0804.\]In the example above, how many throws would you expect to need?

**Answer:**\(\text{E}[X]=\frac{1}{1/6}=6\).

### 3.2.5 Negative Binomial Distributions

Consider now a sequence of Bernoulli trials, with probability \(p\) of
success at each step. Let the **negative binomial** random variable \(X\)
denote the number of steps before the \(r\)th success occurs.

The probability mass function is given by \[f(x)=P(X=x)=\binom{x-1}{r-1}(1-p)^{x-r}p^r,\quad x=r,r+1,\ldots\] which we denote by \(X\sim \text{NegBin}(p,r)\).

For this random variable, we have \[\text{E}[X]=\frac{r}{p} \quad\mbox{and}\quad \text{Var}[X]=\frac{r(1-p)}{p^2}.\]

**Examples:**

A fair \(6-\)sided die is thrown until it three \(6\)’s are rolled. What is the probability that \(5\) throws are required?

**Answer:**If \(5\) throws are required, we have to compute \(P(X=5)\), where \(X\) is geometric \(\text{NegBin}(1/6,3)\): \[\begin{aligned} P(X=5)&=\binom{5-1}{3-1}(1-p)^{5-3}p^3\\&=\binom{4}{2}(5/6)^2(1/6)^3\approx 0.0193. \end{aligned}\]In the example above, how many throws would you expect to need?

**Answer:**\(\text{E}[X]=\frac{3}{1/6}=18\).

### 3.2.6 Poisson Distributions

Let us say we are counting the number of “changes” that occur in a
continuous interval of time or space.^{27}

We have a **Poisson process** with rate \(\lambda\), denoted by
\(\mathcal{P}(\lambda)\), if:

the number of changes occurring in non-overlapping intervals are

**independent**;the probability of exactly one change in a short interval of length \(h\) is approximately \(\lambda h\), and

The probability of \(2+\) changes in a sufficiently short interval is essentially \(0\).

Assume that an experiment satisfies the above properties. Let \(X\) be the
number of changes in a **unit interval** (this could be \(1\) day, or \(15\)
minutes, or \(10\) years, etc.).

What is \(P(X=x)\), for \(x=0, 1, \ldots\)? We can get to the answer by first partition the unit interval into \(n\) disjoint sub-intervals of length \(1/n\). Then,

by the second condition, the probability of one change occurring in one of the sub-intervals is approximately \(\lambda/n\);

by the third condition, the probability of \(2+\) changes is \(\approx 0\), and

by the first condition, we have a sequence of \(n\) Bernoulli trials with probability \(p=\lambda/n\).

Therefore, \[\begin{aligned} f(x)&=P(X=x) \approx \frac{n!}{x!(n-x)!}\left(\frac{\lambda}{n}\right)^x\left(1-\frac{\lambda}{n}\right)^{n-x} \\ &=\frac{\lambda^x}{x!}\cdot\underbrace{\frac{n!}{(n-x)!}\cdot\frac{1}{n^x}}_{\text{term $1$}}\cdot\underbrace{\left(1-\frac{\lambda}{n}\right)^{n}}_{\text{term $2$}}\cdot\underbrace{\left(1-\frac{\lambda}{n}\right)^{-x}}_{\text{term $3$}}.\end{aligned}\] Letting \(n\to\infty\), we obtain \[\begin{aligned} P(X=x)&=\lim_{n\to\infty}\frac{\lambda^x}{x!}\cdot\underbrace{\frac{n!}{(n-x)!}\cdot\frac{1}{n^x}}_{\text{term $1$}}\cdot\underbrace{\left(1-\frac{\lambda}{n}\right)^{n}}_{\text{term $2$}}\cdot\underbrace{\left(1-\frac{\lambda}{n}\right)^{-x}}_{\text{term $3$}} \\ &=\frac{\lambda^x}{x!}\cdot 1 \cdot \exp(-\lambda)\cdot 1 = \frac{\lambda^xe^{-\lambda}}{x!}, \quad x=0, 1, \ldots \end{aligned}\] Let \(X\sim \mathcal{P}(\lambda)\). Then it can be shown that \[\text{E} [X]=\lambda \quad\mbox{and}\quad \text{Var} [X]=\lambda,\] that is, the mean and the variance of a Poisson random variable are identical.

We can compute related probabilities in `R`

*via* the base functions `rpois()`

, `dpois()`

, etc., with required parameters `n`

and `lambda`

.

We can draw a sample of size 1 from \(\mathcal{P}(13)\) in `R`

as follows:

`[1] 18`

We sample independently 500 times; this yields an empirical expectation and variance.

```
[1] 13 12 14 12 18 9
[1] 12.874
[1] 12.92798
```

The sample’s histogram is shown below.

**Examples:**

A traffic flow is typically modeled by a Poisson distribution. It is known that the traffic flowing through an intersection is \(6\) cars/minute, on average. What is the probability of no cars entering the intersection in a \(30\) second period?

**Answer:**\(6\) cars/min \(=\) \(3\) cars/\(30\) sec. Thus \(\lambda=3\), and we need to compute \[P(X=0)=\frac{3^0e^{-3}}{0!}=\frac{e^{-3}}{1}\approx 0.0498.\]A hospital needs to schedule night shifts in the maternity ward. It is known that there are \(3000\) deliveries per year; if these happened randomly round the clock,[Is this a reasonable assumption?], we would expect \(1000\) deliveries between the hours of midnight and 8.00 a.m., a time when much of the staff is off-duty.

It is thus important to ensure that the night shift is sufficiently staffed to allow the maternity ward to cope with the workload on any particular night, or at least, on a high proportion of nights.

The average number of deliveries per night \[\lambda = 1000/365.25\approx 2.74.\] If the daily number \(X\) of night deliveries follows a Poisson process \(\mathcal{P}(\lambda)\), we can compute the probability of delivering \(x=0,1,2,\ldots\) babies on each night.

For a Poisson distribution, the probability mass values \(f(x)\) can be obtained using

`dpois()`

(for a general distribution, replace the`r`

in the`rxxxxx(...)`

random number generators by`d`

:`dxxxxx(...)`

).We setup the Poisson distribution parameters and the distribution’s range (in theory, it goes to infinity, but we have got to stop somewhere in practice).

The probablity mass and cumulative distribution functions are shown below:

x pmf cdf 0 0.0645703 0.0645703 1 0.1769228 0.2414931 2 0.2423842 0.4838773 3 0.2213775 0.7052548 4 0.1516436 0.8568984 5 0.0831007 0.9399991 6 0.0379493 0.9779484 7 0.0148544 0.9928029 8 0.0050876 0.9978905 9 0.0015489 0.9994394 10 0.0004244 0.9998638 Here is the p.m.f. plot:

`plot(x,pmf, type="h", col=2, main="Poisson PMF", xlab="x", ylab="f(x)=P(X=x)") points(x,pmf, col=2) abline(h=0, col=4)`

and the c.d.f. plot:

If the maternity ward wants to prepare for the greatest possible traffic on \(80\%\) of the nights, how many deliveries should be expected?

**Answer:**we seek an \(x\) for which \[P(X\leq x-1)\leq 0.80\leq P(X\leq x).\]Let’s plot the height \(F(x)=0.8\) on the c.d.f.:

`plot(c(1,x),c(0,cdf), type="s", col=2, main="Poisson CDF", xlab="x", ylab="F(x)=P(X<=x)") abline(h=0:1, col=4) abline(h=0.8, col=1)`

The \(y=0.8\) line crosses the CMF at \(x=4\); let’s evaluate \(F(3)=P(X\leq 3)\) and \(F(4)=P(X\leq 4)\) to confirm that \(F(3)\leq 0.8 \leq F(4)\).

`[1] 0.7052548 [1] 0.8568984`

It is indeed the case. Thus, if the hospital prepares for \(4\) deliveries a night, they will be ready for the worst on at least \(80\%\) of the nights (closer to \(85.7\%\), actually).

Note that this is different than asking how many deliveries are expected nightly (namely, \(\text{E}[X]=2.74\)).

On how many nights in the year would \(5\) or more deliveries be expected?

**Answer:**we need to evaluate \[\begin{aligned} 365.25\cdot P(X\ge 5)&=365.25 (1-P(X\le 4)). \end{aligned}\]`[1] 52.26785`

Thus, roughly \(14\%\) of the nights.

Over the course of one year, what is the greatest number of deliveries expected on any night?

**Answer:**we are looking for largest value of \(x\) for which \(365.25\cdot P(X=x)\geq 1\) (if \(365.25\cdot P(X=x)<1\), then the probability of that number of deliveries is too low to expect that we would ever see it during the year).The expected number of nights with each number of deliveries can be computed using:

`[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] 0.00000 1.00000 2.00000 3.00000 4.00000 5.00000 6.00000 7.000000 nights 23.58432 64.62103 88.53082 80.85815 55.38783 30.35253 13.86099 5.425587 [,9] [,10] [,11] 8.000000 9.000000 10.0000000 nights 1.858264 0.565738 0.1550122`

The largest index is:

`[1] 8`

Namely, \(x=8\). Indeed, for larger values of \(x\), \(365.25\cdot P(X=x)<1\).

`[1] 1.858264 [1] 0.565738`

### 3.2.7 Other Discrete Distributions

Wikipedia [30] lists other common discrete distributions:

the

**Rademacher**distribution, which takes values \(1\) and \(-1\), each with probability \(1/2\);the

**beta binomial**distribution, which describes the number of successes in a series of independent Bernoulli experiments with heterogeneity in the success probability;the

**discrete uniform**distribution, where all elements of a finite set are equally likely (balanced coin, unbiased die, first card of a well-shuffled deck, etc.);the

**hypergeometric**distribution, which describes the number of successes in the first \(m\) of a series of \(n\) consecutive Bernoulli experiments, if the total number of successes is known;the

**negative hypergeometric**distribution, which describes the number of attempts needed to get the \(n\)th success in a series of Bernoulli experiments;the

**Poisson binomial**distribution, which describes the number of successes in a series of independent Bernoulli experiments with different success probabilities;**Benford’s Law**, which describes the frequency of the first digit of many naturally occurring data.**Zipf’s Law**, which describes the frequency of words in the English language;the

**beta negative binomial**distribution, which describes the number of failures needed to obtain \(r\) successes in a sequence of independent Bernoulli experiments;etc.