Understanding a common proof for linearity of expectation

297 Views Asked by At

For any two discrete random variables $X,Y:\Omega \to \mathbb{R}$ in a probability space $(\Omega, \mathcal{F}, \mathbb{P})$, linearity of expectation tells us that: $$\mathbb{E}(X+Y) := \sum_{t \in (X+Y)(\Omega)}t\cdot\mathbb{P}(X+Y = t) = \mathbb{E}X + \mathbb{E}Y$$

Here I use $(X+Y)(\Omega)$ to denote the image of $X + Y$ and $X + Y = t$ to denote the event $\{\omega \in \Omega\,\mid\, X(\omega) + Y(\omega) = t\}$. In university lecture notes, textbooks, and online forums (like this one) you see a lot of proofs of this fact that begin something like this: $$\mathbb{E}(X + Y) := \sum_{x \in X(\Omega)} \sum_{y \in Y(\Omega)} (x + y)\cdot\mathbb{P}(X = x, Y = y)$$ How in any way does this coincide with the definition of expectation that I gave above? I realize that there is an equivalent definition (namely, $\mathbb{E}X = \sum_{\omega \in \Omega} X(\omega) \mathbb{P}(\{\omega\})$) that trivializes the proof. The way I prefer to prove this fact is to first prove equivalence between the two definitions and then use the second definition to prove linearity. However, I find that the above method of proof is very common in other areas of probability theory aside from just linearity of expectation, so I'd like to better understand it.

To me this feels a lot like the Law of the Unconscious Statistician; the first step of the proof is actually a large jump in reasoning but, given little thought, appears intuitively true (though it is currently very unintuitive to me). How do I go above proving equivalence between the two sums? Is this true in other scenarios? For example: $$\mathbb{E}(XY) := \sum_{t \in (XY)(\Omega)} t \cdot \mathbb{P}(XY = t) \stackrel{??}{=} \sum_{x \in X(\Omega)} \sum_{y \in Y(\Omega)} xy\cdot\mathbb{P}(X = x, Y = y)$$

Does the above hold? Help is appreciated, thanks in advance.

3

There are 3 best solutions below

2
On BEST ANSWER

$\newcommand{\R}{\mathbf R}$ $\newcommand{\E}{\mathbb E}$

It will be useful to be acquainted with the notion of push-forward of a (probability) measure. I will be stating things for finite spaces but everything can be done in full generality.

Let $\Omega_1$ and $\Omega_2$ be two finite sets and $\mu$ be a probability measure on $\Omega_1$. Let $f:\Omega_1\to \Omega_2$ be any map. We define the push-forward $\mu$ under $f$ as a probability measure $\nu$ defined as $$ \nu(B_2) = \mu(f^{-1}(B_2)) $$ for all subsets $B_2$ of $\Omega_2$. The push-forward of $\mu$ by $f$ will be denoted by $f_*\mu$. Note that the push-forward behaves well under composition. More precisely, if $f:\Omega_1\to \Omega_2$ and $g:\Omega_2\to \Omega_3$ are two maps, and $\mu$ is a probability measure on $\Omega_1$ then $(g\circ f)_*\mu = g_*(f_*\mu)$.

It is an easy verification that if $Y:\Omega_2\to \R$ is a map, and $f:\Omega_1\to \Omega_2$ is any map and $\mu$ is a probability measure on $\Omega_1$, then $$ \E_\mu[Y\circ f] = \E_{f_*\mu}[Y] $$

With this in hand, suppose we have a probability space $(\Omega, \mu)$ and two random variables $X, Y:\Omega\to \R$. Define a map $Z:\Omega\to \R\times \R$ as $Z(\omega)=(X(\omega), Y(\omega))$ and let $a:\R\times \R\to \R$ be the addition map, that is, $a(x, y)=x+y$. Now $\E_\mu[X+Y]= \E_\mu[a\circ Z] = \E_{Z_*\mu}[a]$. But $$ \E_{Z_*\mu}[a] = \sum_{(x, y)\in \text{image}(Z)} a(x, y) (Z_*\mu)(x, y) = \sum_{(x, y)\in \text{image}(Z)} (x+y)\mu(X=x, Y=y) $$ Note that one can replace $\text{image}(Z)$ above by anything that contains $\text{image}(Z)$, in particular, one can write $\text{image}(X)\times \text{image}(Y)$ in place of $\text{image}(Z)$. So we have $$ \E_{\mu}[X+Y] = \sum_{(x, y)\in \text{image}(X)\times \text{image}(Y)} (x+y)\mu(X=x, Y=y) $$

One can play the same game to obtain a similar expression for $\E_\mu[XY]$. Just replace the addition map $a$ by the multiplication map $(x, y)\mapsto xy:\R\times \R\to \R$.

2
On

This is an application of the law of total probability: If $\{A_1, A_2,A_3 …\}$ is a countably infinite partition of the sample space $\Omega$ then for any event $B$ we have $$ P[B] = \sum_{i=1}^{\infty} P[B \cap A_i]$$


Let $f:\mathbb{R}^2 \rightarrow \mathbb{R}$ be a function and define $Z=f(X,Y)$. Let $S_X$, $S_Y$, and $S_{Z}$ be the (discrete) sets of all possible values of of the discrete random variables $X, Y$, and $Z$, respectively. A convenient partition of $\Omega$ is $$ \Omega = \cup_{x \in S_X} \cup_{y \in S_Y} \{X=x, Y=y\}$$ So for each $t \in S_Z$ we have \begin{align} P[f(X,Y)=t] &= \sum_{x \in S_X}\sum_{y \in S_Y} P[f(X,Y)=t \cap \{X=x, Y=y\}]\\ &=\sum_{x \in S_X}\sum_{y \in S_Y} P[f(x,y)=t \cap \{X=x, Y=y\}]\\ &= \sum_{x \in S_X} \sum_{y \in S_Y} 1_{\{f(x,y)=t\}} P[X=x,Y=y] \quad (Eq. 1) \end{align} where $1_A$ is an indicator function that is 1 if event $A$ is true, and $0$ else. So \begin{align} E[Z] &=\sum_{t \in S_{Z}} t P[f(X,Y)=t] \\ &\overset{(a)}{=} \sum_{t \in S_Z} t\left[\sum_{x \in S_X} \sum_{y \in S_Y} 1_{\{f(x,y)=t\}} P[X=x,Y=y]\right]\\ &=\sum_{x \in S_X} \sum_{y \in S_Y} P[X=x,Y=y]\left(\sum_{t \in S_Z} t 1_{\{f(x,y)=t\}}\right)\\ &= \sum_{x \in S_X} \sum_{y \in S_Y} P[X=x,Y=y]f(x,y) \end{align} where (a) holds by (Eq. 1).

0
On

Just to complement the existing answers, I would like to mention the following viewpoint from which the equivalence between your two definitions might be considered as "trivial".

The probability $\mathbb{P}(f(X,Y)=t)$ is equal to $$ \sum_{x,y; f(x,y)=t}\mathbb{P}(X=x,Y=y). $$

From here it follows almost immediately that \begin{align} \sum_t g(t)\mathbb{P}(f(X,Y)=t) &= \sum_t \sum_{x,y; f(x,y)=t} g(t)\mathbb{P}(X=x,Y=y)\\ &= \sum_{x,y} g(f(x,y))\mathbb{P}(X=x,Y=y). \end{align}

The results you are interested in are the special case $g(t)=t$ and $f(x,y)=x+y$ or $f(x,y) = xy$.