3.4 Joint Distributions

Let \(X\), \(Y\) be two continuous random variables. The joint probability distribution function (joint p.d.f.) of \(X,Y\) is a function \(f(x,y)\) satisfying:

  1. \(f(x,y)\geq 0\), for all \(x\), \(y\);

  2. \(\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}f(x,y)\, dxdy=1\), and

  3. \(P(A)=\iint_Af(x,y)\, dxdy\), where \(A\subseteq \mathbb{R}^2\).

For a discrete variable, the properties are the same, except that we replace integrals by sums, and we add a property to the effect that \(f(x,y)\leq 1\) for all \(x,y\).

Property 3 implies that \(P(A)\) is the volume of the solid over the region \(A\) in the \(xy\) plane bounded by the surface \(z=f(x,y)\).


Examples:

  • Roll a pair of unbiased dice. For each of the \(36\) possible outcomes, let \(X\) denote the smaller roll, and \(Y\) the larger roll (taken from [32]).

    1. How many outcomes correspond to the event \(A=\{(X=2,Y=3)\}\)?

      Answer: the rolls \((3,2)\) and \((2,3)\) both give rise to event \(A\).

    2. What is \(P(A)\)?

      Answer: there are \(36\) possible outcomes, so \(P(A)=\frac{2}{36}\approx 0.0556\).

    3. What is the joint p.m.f. of \(X,Y\)?

      Answer: only one outcome, \((X=a,Y=a)\), gives rise to the event \(\{X=Y=a\}\). For every other event \(\{X\neq Y\}\), two outcomes do the trick: \((X,Y)\) and \((Y,X)\). The joint p.m.f. is thus \[f(x,y)=\begin{cases}1/36 & \text{$1\leq x=y\leq 6$} \\ 2/36 & \text{$1\leq x<y\leq 6$}\end{cases}\] The first property is automatically satisfied, as is the third (by construction). There are only \(6\) outcomes for which \(X=Y\), all the remaining outcomes (of which there are \(15\)) have \(X< Y\).

      Thus, \[\sum_{x=1}^6\sum_{y=x}^6 f(x,y)=6\cdot\frac{1}{36}+15\cdot \frac{2}{36}=1.\]

      Joint p.m.f. for $X$ and $Y$ in the dice example.

      Figure 3.17: Joint p.m.f. for \(X\) and \(Y\) in the dice example [32].

    4. Compute \(P(X=a)\) and \(P(Y=b)\), for \(a,b=1,\ldots, 6\).

      Answer: for every \(a=1,\ldots,6\), \(\{X=a\}\) corresponds to the following union of events: \[\begin{aligned} \{X=a,Y=a\}\cup &\{X=a,Y=a+1\}\cup \cdots \\ &\cdots \cup \{X=a,Y=6\}.\end{aligned}\] These events are mutually exclusive, so that \[\begin{aligned} P(X=a)&=\sum_{y=a}^6P(\{X=a,Y=y\})\\&=\frac{1}{36}+\sum_{y=a+1}^6\frac{2}{36} \\ &=\frac{1}{36}+\frac{2(6-a)}{36}, \quad a=1,\ldots, 6.\end{aligned}\] Similarly, we get \[P(Y=b)=\frac{1}{36}+\frac{2(b-6)}{36},\ b=1,\ldots, 6.\] These marginal probabilities can be found in the margins of the p.m.f.

    5. Compute \(P(X=3\mid Y>3)\), \(P(Y\le 3 \mid X\geq 4)\).

      Answer: the notation suggests how to compute these conditional probabilities: \[\begin{aligned} P(X=3\mid Y>3)&=\frac{P(X=3 \cap Y>3)}{P(Y>3)} \\ P(Y=3|X\geq 4)&=\frac{P(Y=3 \cap X\geq 4)}{P(X \geq 4)}\end{aligned}\] The region corresponding to \(P(Y>3)=\frac{27}{36}\) is shaded in red (see image at the top of the following column); the region corresponding to \(P(X=3)=\frac{7}{36}\) is shaded in blue. The region corresponding to \[P(X=3\cap Y>3)=\frac{6}{36}\] is the intersection of the regions: \[P(X=3\mid Y>3)=\frac{6/36}{27/36}=\frac{6}{27}\approx 0.2222.\] As \(P(Y\le 3\cap X\ge 4)=0\), \(P(Y\le 3|X\ge 4)=0\).

      Conditional probabilities in the dice example.

      Figure 3.18: Conditional probabilities in the dice example [32].

    6. Are \(X\) and \(Y\) independent?

      Answer: why didn’t we simply use the multiplicative rule to compute \[P(X=3 \cap Y>3)=P(X=3)P(Y>3)?\] It’s because \(X\) and \(Y\) are not independent, that is, it is not always the case that \[P(X=x,Y=y)=P(X=x)P(Y=y)\] for all allowable \(x,y\).

      As it is, \(P(X=1,Y=1)=\frac{1}{36}\), but \[P(X=1)P(Y=1)=\frac{11}{36}\cdot \frac{1}{36}\neq \frac{1}{36},\] so \(X\) and \(Y\) are dependent (this is often the case when the domain of the joint p.d.f./p.m.f. is not rectangular).

  • There are \(8\) similar chips in a bowl: three marked \((0,0)\), two marked \((1,0)\), two marked \((0,1)\) and one marked \((1,1)\). A player selects a chip at random and is given the sum of the two coordinates, in dollars (taken from [32]).

    1. What is the joint probability mass function of \(X_1\), and \(X_2\)?

      Answer: let \(X_1\) and \(X_2\) represent the coordinates; we have \[f(x_1,x_2)=\frac{3-x_1-x_2}{8},\quad x_1,x_2=0,1.\]

    2. What is the expected pay-off for this game?

      Answer: the pay-off is simply \(X_1+X_2\). The expected pay-off is thus \[\begin{aligned} \text{E}[X_1+X_2]&=\sum_{x_1=0}^1\sum_{x_2=1}^0(x_1+x_2)f(x_1,x_2)\\&=0\cdot \frac{3}{8}+1\cdot\frac{2}{8}+1\cdot \frac{2}{8}+2\cdot \frac{1}{8}\\&=0.75. \end{aligned}\]

  • Let \(X\) and \(Y\) have joint p.d.f. \[f(x,y)=2,\quad 0\leq y\leq x\leq 1.\]

    1. What is the support of \(f(x,y)\)?

      Answer: the support is the set \(S=\{(x,y):0\leq y\leq x\leq 1\}\), a triangle in the \(xy\) plane bounded by the \(x-\)axis, the line \(y=1\), and the line \(y=x\).

      The support is the blue triangle shown below.

      Support for the joint distribution of $X$ and $Y$ in the above example.

      Figure 3.19: Support for the joint distribution of \(X\) and \(Y\) in the above example.

    2. What is \(P(0\leq X\leq 0.5, 0\leq Y\leq 0.5)\)?

      Answer: we need to evaluate the integral over the shaded area: \[\begin{aligned} P(0\leq &X\leq 0.5,0\leq Y\leq 0.5)\\&=P(0\leq X\leq 0.5,0\leq Y\leq X)\\& =\int_{0}^{0.5}\int_{0}^x2\, dydx\\&=\int_0^{0.5}\left[2y\right]_{y=0}^{y=x}\, dx \\ & =\int_{0}^{0.5}2x\, dx=1/4.\end{aligned}\]

    3. What are the marginal probabilities \(P(X=x)\) and \(P(Y=y)\)?

      Answer: for \(0\leq x\leq 1\), we get \[\begin{aligned} P(X=x)&=\int_{-\infty}^{\infty}f(x,y)\, dy\\ &=\int_{y=0}^{y=x}2\, dy=\left[2y\right]_{y=0}^{y=x}=2x, \end{aligned}\] and for \(0\leq y\leq 1\), \[\begin{aligned} P(Y=y)&=\int_{-\infty}^{\infty}f(x,y)\, dx=\int_{x=y}^{x=1}2\, dx\\ &=\left[2x\right]_{x=y}^{x=1}=2-2y.\end{aligned}\]

    4. Compute \(\text{E}[X]\), \(\text{E}[Y]\), \(\text{E}[X^2]\), \(\text{E}[Y^2]\), and \(\text{E}[XY]\).

      Answer: we have \[\begin{aligned} \text{E}[X]&=\iint_Sxf(x,y)\, dA =\int_{0}^{1}\int_{0}^x2x\, dydx\\&=\int_0^1\left[2xy\right]_{y=0}^{y=x}\, dx = \int_{0}^1 2x^2\, dx \\&=\left[\frac{2}{3}x^3\right]_{0}^{1}=\frac{2}{3};\\ \text{E}[Y]&=\iint_Syf(x,y)\, dA =\int_{0}^{1}\int_{y}^12y\, dxdy\\&=\int_0^1\left[2xy\right]_{x=y}^{x=1}\, dy = \int_{0}^1 (2y-2y^2)\, dy \\&=\left[y^2-\frac{2}{3}y^3\right]_{0}^{1}=\frac{1}{3}; \\ \text{E}[X^2]&=\iint_Sx^2f(x,y)\, dA =\int_{0}^{1}\int_{0}^x2x^2\, dydx\\&=\int_0^1\left[2x^2y\right]_{y=0}^{y=x}\, dx = \int_{0}^1 2x^3\, dx \\&=\left[\frac{1}{2}x^4\right]_{0}^{1}=\frac{1}{2};\\ \text{E}[Y^2] &= \iint_Sy^2f(x,y)\, dA =\int_{0}^{1}\int_{y}^12y^2\, dxdy\\&=\int_0^1\left[2xy^2\right]_{x=y}^{x=1}\, dy = \int_{0}^1 (2y-2y^3)\, dy \\&=\left[\frac{2}{3}y^3-\frac{1}{2}y^4\right]_{0}^{1}=\frac{1}{6};\\ \text{E}[XY]&=\iint_Sxyf(x,y)=\int_{0}^1\int_{0}^x2xy\, dydx\\&=\int_0^2\left[xy^2\right]_{y=0}^{y=x}=\int_{0}^1x^2\, dx \\ &=\left[\frac{x^4}{4}\right]_{0}^1=\frac{1}{4}.\end{aligned}\]

    5. Are \(X\) and \(Y\) independent?

      Answer: they are not independent as the support of the joint p.d.f. is not rectangular.

The covariance of two random variables \(X\) and \(Y\) can give some indication of how they depend on one another: \[\begin{aligned} \text{Cov}(X,Y)&=\text{E}[(X-\text{E}[X])(Y-\text{E}[Y])]\\&=\text{E}[XY]-\text{E}[X]\text{E}[Y].\end{aligned}\] When \(X=Y\), the covariance reduces to the variance.33


Example: in the last example, \(\text{Var}[X]=\frac{1}{2}-(\frac{2}{3})^2=\frac{1}{18}\), \(\text{Var}[X]=\frac{1}{6}-(\frac{1}{3})^2=\frac{1}{18}\), and \(\text{Cov}(X,Y)=\frac{1}{4}-\frac{2}{3}\cdot\frac{1}{3}=\frac{1}{36}\).

In R, we can generate a multivariate joint normal via MASS’s mvrnorm(), whose required paramters are n, a mean vector mu and a covariance matrix Sigma.

Let’s start with a standard bivariate joint normal of mean \(\mu=(0,2)\) and covariance matrix \(\Sigma=\begin{pmatrix}1 & 0 \\ 0 & 1\end{pmatrix}\).

mu = rep(0,2)
Sigma = matrix(c(1,0,0,1),2,2)

We sample 1000 observations from the joint normal \(N(\mu,\Sigma)\).

library(MASS)
a<-mvrnorm(1000,mu,Sigma)
a<-data.frame(a)
str(a)
'data.frame':   1000 obs. of  2 variables:
 $ X1: num  -0.5267 -0.3153 -0.4686 0.6782 0.0105 ...
 $ X2: num  0.1095 -0.721 -0.0943 0.1005 1.7259 ...

What would you expect to see when we plot the data?

library(ggplot2)
library(hexbin)
qplot(X1, X2, data=a, geom="hex")

The covariance matrix was the identity (diagonal), so we expect the blob to be circular. What happens if we use a non-diagonal covariance matrix?

mu = c(-3,12)
Sigma = matrix(c(110,15,15,3),2,2)
a<-mvrnorm(1000,mu,Sigma)
a<-data.frame(a)
qplot(X1, X2, data=a, geom="hex") + ylim(-40,40) + xlim(-40,40)

References

[32]
R. V. Hogg and E. A. Tanis, Probability and Statistical Inference, 7th ed. Pearson/Prentice Hall, 2006.