Probability generating function of the sum of two random variables

4.7k Views Asked by At

Let two $\mathbf{N}$-valued random variables $X$ and $Y$ be given, and let $\phi_X(s) = \sum_k \mathbf{P}(X = k) s^k$ and $\phi_Y(s) = \sum_k \mathbf{P}(Y = k) s^k$ be their respective probability generating functions.

It is stated in Probability and Statistics by Example, Suhov and Kelbert, p. 59, that the following two conditions are equivalent.

(i) $X$ and $Y$ are independent.

(ii) $\phi_{X + Y}(s) = \phi_X(s) \phi_Y(s)$.

I don't have a problem with the fact that (i) implies (ii). However, I don't understand why the converse is true.

The only justification offered in the book is that it follows from the uniqueness of the coefficients of a power series. But I don't understand why the fact that $X + Y$ has the same distribution as if $X$ and $Y$ were independent ought to imply that they are in fact independent.

Can anybody fill in the details of the argument, or else refer me to an easily accessible source? Thanks.

2

There are 2 best solutions below

6
On

$\begin{align} \text{Given that}: \\ \phi_{X+Y}(s) & = \sum_{k\in\mathcal D(X+Y)} \mathsf P(X+Y=k)\; s^k \\ & = \sum_{x\in\mathcal D(X)} \;\sum_{y\in \mathcal D(Y)} \mathsf P(X=x, Y=y)\; s^{x+y} \\[2ex] \text{And that} \\ \phi_X(s)\phi_Y(s) & = (\sum_{x\in\mathcal D(X)} \mathsf P(X=x)\; s^x )\cdot( \sum_{y\in\mathcal D(Y)} \mathsf P(Y=y)\; s^y) \\ & = \sum_{x\in\mathcal D(X)}\sum_{y\in\mathcal D(Y)} \mathsf P(X=x)\mathsf P(Y=y)\; s^{x+y} \\[2ex] \text{And that independence means:} \\ X\bot Y & \iff \mathsf P(X=x , Y=y) = \mathsf P(X=x)\mathsf P(Y=y) \\[2ex] \text{Therefore } & \text{ for all X, Y using that probability generating function:} \\ & \text{ Independence is a necessary and sufficient condition to declare that} \\ & \text{ the the pgf of the sum, $X+Y$, is equal to the product of the pgfs of $X$ and $Y$.} \\ X\bot Y & \iff \phi_{X+Y}(s) = \phi_X(s)\phi_Y(s) \end{align}$


Edit Summary

Updated to be clear about the domains of the sum, and why we can switch from summing all k in the domain of X+Y to double summing over all x in the domain of X and all y in the domain of Y. It's basically about expectations.

$$\begin{align} \sum_{z\in \mathcal D(X+Y)} s^z\;\mathsf P(X+Y=z) & = \sum_{z\in \mathcal D(X+Y)} s^z\; \sum_{x\in\mathcal D(X)} \mathsf P(X=x\cap X+Y=z) & \text{Law of Total Probability} \\ & = \sum_{z\in \mathcal D(X+Y)} s^z \sum_{x\in\mathcal D(X)} \mathsf P(X=x\cap Y=z-x) & \text{Equivalence of events} \\ & = \sum_{x\in\mathcal D(X)} \sum_{z\in \mathcal D(X+Y\mid X=x)} s^{z} \mathsf P(X=x\cap Y=z-x) & \text{by reordering the summations} \\ & = \sum_{x\in\mathcal D(X)} \sum_{y\in \mathcal D(Y)} s^{x+y} \mathsf P(X=x \cap Y=y) & \text{by change of index} \\[3ex] \text{Alternatively:} \\ \mathsf E_{X+Y}[s^{X+Y}] & = \mathsf E_X[\mathsf E_{X+Y\mid X}[s^{X+Y}\mid X]] & \text{Conditional Expectation} \\ & = \mathsf E_X[s^X \mathsf E_{Y\mid X}[s^{Y}\mid X]] & \text{Linearity of Expectation} \\[3ex] \therefore \mathsf E_{X+Y}[s^{X+Y}] & = \mathsf E_X[s^X \mathsf E_Y[s^Y]] & \text{by independence} \\ & = \mathsf E_X[s^X]\times \mathsf E_Y[s^Y] & \text{by linearity of expectation} \end{align}$$

5
On

A dimensional analysis shows that the implication (ii) $\implies$ (i) does not hold.

Let $p_n=P(X=n)$, $q_n=P(Y=n)$ and $r_{n,k}=P(X=n,Y=k)$ for every $n$ and $k$, then, as the OP knows, (ii) means that for every nonnegative $n$, $$P(X+Y=n)=\sum_{k=0}^nP(X=k)P(Y=n-k).$$ On the other hand, $$[X+Y=n]=\bigcup_{k=0}^n[X=k,Y=k-n],$$ and each union in the RHS is disjoint hence (ii) is equivalent to the condition that, for every $n$, $$\sum_{k=0}^nr_{k,n-k}=\sum_{k=0}^np_kq_{n-k}.$$ Furthermore, for every $(n,k)$, $$\sum_{i=0}^\infty r_{n,i}=p_n,\qquad\sum_{i=0}^\infty r_{i,k}=q_k.$$ Assume that $0\leqslant X\leqslant N$ and $0\leqslant Y\leqslant M$ almost surely and that $(p_n)$ and $(q_n)$ are fixed, then the distribution of $(X,Y)$ depends on $NM$ free parameters $r_{n,k}$. On the other hand, $0\leqslant X+Y\leqslant N+M$ almost surely hence condition (ii) is an affine system of $N+M+1$ equations.

If $NM\gt N+M+1$, say, for $N=M=3$, the latter cannot fully determine the former hence the claim that the identity $\varphi_{X+Y}=\varphi_X\cdot\varphi_Y$ implies the independence of $X$ and $Y$ does not hold.

Here is a distribution on the set $\{0,1,2,3\}\times\{0,1,2,3\}$ such that, if $(X,Y)$ has this distribution then $X$ and $Y$ are both binomial $(3,\frac12)$ and $X+Y$ is binomial $(6,\frac12)$, and yet, $(X,Y)$ is not independent (for simplicity, the node $(n,k)$ carries the weight $64\cdot r_{n,k}$, for example $r_{1,2}=\frac9{64}$): $$\begin{matrix}(3)&7&0&0&1\\(2)&0&4&14&6\\(1)&\color{red}{0}&14&9&1\\(0)&1&6&1&0\\n/k&(0)&(1)&(2)&(3)\end{matrix}$$ The sums of the rows and of the columns are proportional to $(1,3,3,1)$, the third line of Pascal triangle, the sums of the diagonals in the direction NW-SE are $(1,6,15,20,15,6,1)$, the sixth line of Pascal triangle, hence $X$, $Y$ and $X+Y$ have the desired distributions and yet $(X,Y)$ is not independent since, for example, $P(X=1,Y=0)=\color{red}{0}\ne\frac38\cdot\frac18$.

Another example, based on the uniform distributions on the set $\{0,1,2,3\}$ (this time, the node $(n,k)$ carries the weight $16\cdot r_{n,k}$): $$\begin{matrix}(3)&3&0&0&1\\(2)&0&0&2&2\\(1)&0&2&1&1\\(0)&1&2&1&0\\n/k&(0)&(1)&(2)&(3)\end{matrix}$$ And finally a minimal example, based on the uniform distributions on the set $\{0,1,2\}$ (this time, the node $(n,k)$ carries the weight $9\cdot r_{n,k}$): $$\begin{matrix}(2)&2&0&1\\(1)&0&1&2\\(0)&1&2&0\\n/k&(0)&(1)&(2)\end{matrix}$$