conditional expectation of sum of RV

203 Views Asked by At

I am given a set of i.i.d RVs $\{ X_1,X_2 \dots X_n\}$. Let $\mathbf{Z} = X_1+X_2 + \dots + X_n$ and define $\mathbf{E}_i(\mathbf{Z}) = \mathbf{E}(\mathbf{Z}|X_1,X_2\dots X_i)$ and $\Delta_i = \mathbf{E}_i-\mathbf{E}_{i-1} ~~(with~~ \mathbf{E}_0 = \mathbf{E})$.

In a book I follow, it says if $j > i$, then $\mathbf{E}_i(\Delta_j) = 0$. I tried to work it out, and at the end it boils down to calculating

$\mathbf{E}(\mathbf{E}( (\mathbf{Z}|X_1,\dots X_j))|X_1,\dots X_i)$. What is the definition of the conditional expectation of a conditioned variable, and how can I calculate it ?

1

There are 1 best solutions below

2
On BEST ANSWER

Since your answer somewhat immediately follows from tower rule, I will try to give some insight into conditional expectation and how you can understand the tower rule.

Let me try to answer this in two levels:


Conditional expectation of $X$ given $Y$, also more commonly written as $\mathbb{E}[X \mid Y]$, is what you would guess $X$ to be if you know $Y$.

For instance, let $X = Y + Z$, where $Y$ and $Z$ are independent. Then, $\mathbb{E}[X \mid Y] = \mathbb{E}[Y + Z \mid Y]$. Now, let's think about this; what is the best guess for $Y + Z$ given $Y$? Well, it is $Y + \mathbb{E}[Z]$; i.e. it is what I know for $Y$ and then what I would expect Z to be.

Now, we can try to understand the tower rule in this simple perception: $ \mathbb{E}[\mathbb{E}[Z \mid X_1, X_2]\mid X_1]$ = $ \mathbb{E}[Z\mid X_1]$, since for the outer step of the iterated expectation on the left hand side, we give up our information about $X_2$ and simply stick with our knowledge of $X_1$. Similarly, $\mathbb{E}[\mathbb{E}[Z \mid X_1]\mid X_1, X_2]$ = $ \mathbb{E}[Z\mid X_1]$, since when we first evaluate the inner step of the iterated expectation of the left hand side, we give up our information about $X_2$.

Thus, we get the tower rule (or law of iterated expectation):

$\mathbb{E}[\mathbb{E}[Z \mid X_1]\mid X_1, X_2]$ = $ \mathbb{E}[\mathbb{E}[Z \mid X_1, X_2]\mid X_1]$ = $ \mathbb{E}[Z\mid X_1]$.


More formally, $\mathbb{E}[X \mid Y]$ is actually a shorthand for $\mathbb{E}[X \mid \sigma(Y)]$.

To understand this, we will need to understand what is meant by $\sigma(Y)$. $\sigma(Y)$ contains sets of outcomes (elements of your event space $\Omega$) that, if I know some "nice" set $Y$ belongs to, I know that all $w$ that could have mapped to that "nice" set is in $\sigma(Y)$. (for instance, $\sigma(Y)$ would contain the following sets: $A_1 = \{w \mid Y(w) = 5, w \in \Omega \}$, $A_2 = \{w \mid Y(w) > 10, w \in \Omega \}$, since if I tell you Y = 5, then you can safely say that the outcome that produced it is in $A_1$.)

(If you know what is meant by a Borel sigma algebra $\mathcal{B}$, $\sigma(Y)$ is the collection of sets $A \subset \Omega$, s.t. $A = Y^{-1} (B)$ for some $B \in \mathcal{B}$.)

Now, since the elements of $\sigma(Y)$ are subsets of $\Omega$, that means we can integrate over them using the probability measure that measures the sizes of sets in $\Omega$. $\int_{A_1} Y d\mu$ is a perfectly valid expression to write (if $A_1 \subset \Omega$). For instance, if $Y$ was an indicator for some set $A_2$, $\int_{A_1} Y d\mu = \int_{A_1} \mathbb{1}_{A_2} d\mu = \int_{A_1 \cap A_2} 1 d\mu = \mu(A_1 \cap A_2)$.

Now, $Z = \mathbb{E}[X \mid Y]$ is the random variable that is measurable with respect to $\sigma(Y)$ (i.e, $\sigma(Z) \subset \sigma(Y)$), that agrees with the integral of X whenever we integrate it over any set in $\sigma(Y)$. (For the purposes of this answer, just believe me when I say that this indeed exists and is unique)

Now, tower rule somewhat follows:

$\mathbb{E}[\mathbb{E}[Z \mid X_1]\mid X_1, X_2] = \mathbb{E}[Z\mid X_1]$, since $\mathbb{E}[Z \mid X_1]$ is measurable with respect to $\sigma(X_1, X_2)$.

$\mathbb{E}[\mathbb{E}[Z \mid X_1, X_2]\mid X_1] = \mathbb{E}[Z\mid X_1]$, since for any set $A$ in $\sigma(X_1)$ (which is also in $\sigma(X_1, X_2)$, we have $\int_A \mathbb{E}[Z \mid X_1, X_2] d\mu = \int_A Z d\mu = \int_A \mathbb{E}[Z \mid X_1] d\mu$.

As such, we get the same tower rule:

$\mathbb{E}[\mathbb{E}[Z \mid X_1]\mid X_1, X_2]$ = $ \mathbb{E}[\mathbb{E}[Z \mid X_1, X_2]\mid X_1]$ = $ \mathbb{E}[Z\mid X_1]$.