Iterated conditional probability notation

368 Views Asked by At

I'm currently self-studying Andrew Gelman's book "Bayesian Data Analysis" third edition. At the page 41, they write:

$E(\tilde{y}|y)=E(E(\tilde{y}|\theta,y)|y)$

I am ok with multiple conditions, and usually ok with the math involved in the book. But this notation with a nested condition confuses me. I tried to find a definition for nested conditions in past personal notes/books, but I did not find the definition. Online, I saw something similar in the "Tower property" there https://en.wikipedia.org/wiki/Conditional_expectation#Basic_properties. But that page uses notation and concepts that the book does not use and that are a bit abstract.

I have a feeling that this is the definition I'm looking for:

$Pr[(A|B)|C]:=Pr[A|B,C]$ for events, or $E(E(x|z,y)|y):=E(E(x|z,y))=E(x|y)$ for iterated expectation of random variables.

Does someone have an online reference that defines nested conditions that confirms (or not) my feeling guessed definition? If I'm wrong, what would be the meaning of $Pr[(A|B)|C]$ ,and $E(E(x|z,y)|y)$ (if any)?

Thank you for your help!

3

There are 3 best solutions below

10
On BEST ANSWER

With the help of @Mason, and @William M., and taking the time to review more in depths the law of total expectation, I found the source of my confusion : I was incorrectly using the law of total expectation.

The law of total expectation says $E[U]=E(E[U|Z])$, but it also says that $U$ and $Z$ must be from the same probability space. My mistake was to consider $U=(X|Y=y)$, add the condition with $Z$ ending up with $E[E(X|Z,Y=y)]$, and then I did not understand the necessity for the extra $|Y=y$. I was just adding a condition ignoring that the random variables were living in a restricted probability space. This was my mistake.

$E[X|Y=y]=E[E(X|Z,Y=y)|Y=y]$ is really a direct consequence of $E[U]=E(E[U|Z])$ but with the restriction that everything is happening in the probability space defined by the condition $Y=y$. We add the condition $Z$, but we remain in the probability space $|Y=y$. This is the meaning behind the extra $|Y=y$ I did not write at my first attempt.

$E[E(X|Z,Y=y)|Y=y]\ne E[E(X|Z,Y=y)]$ and I have a good example from the book of Bayesian statistics. In my original problem $E[E(\tilde{Y}|\theta ,Y)|Y]=E[\theta|Y]=\mu_1$ the posterior mean but $E[E(\tilde{Y}|\theta ,Y)]=E[\theta]=\mu_0$ the prior mean.

I don't think my digression on expectation written with subscripts is necessary anymore. So I removed it.

I hope that this answer will help if someone stumbles over the same difficulty!

PS: I think it also clarifies the question about $(A|B)|C$.

0
On

FOR THE SAKE OF EXPLANATION: I will denote with $t$ the sample points of the variate $\theta,$ $\tilde{\omega}$ those of $\tilde{y}$ and with $\omega$ those $y.$ Then, the random vector $(\theta, \tilde{y}, y)$ is assumed to have a density $f_{\theta, \tilde{y}, y}(t, \tilde{\omega}, \omega)$ and that $$ E(E(\tilde{y} \mid \theta, y) \mid y) = \int dt\ \left[ \int d\tilde{\omega}\ \tilde{\omega} f_{\tilde{y} \mid \theta, y}(\tilde{\omega} \mid t, y) \right] f_{\theta \mid y}(t \mid y) $$ coincides with $$ E(\tilde{y} \mid y) = \int d\tilde{\omega}\ \tilde{\omega} f_{\cdot, \tilde{y}, \mid y}(\tilde{\omega} \mid y), $$ where $\cdot$ means that the variable "was integrated out and renormalised." By definition, $$ f_{\cdot, \tilde{y}, \mid y}(\tilde{\omega} \mid y) = \dfrac{\int dt\ f_{\theta, \tilde{y}, y}(t, \tilde{\omega}, y)}{\int d(t, \tilde{\omega})\ f_{\theta, \tilde{y}, y}(t, \tilde{\omega}, y)}, $$ and $$ f_{\tilde{y} \mid \theta, y}(\tilde{\omega} \mid t, y) f_{\theta \mid y}(t \mid y) = \dfrac{f_{\theta, \tilde{y}, y}(t, \tilde{\omega}, y)}{\int d\tilde{\omega}\ f_{\theta, \tilde{y}, y}(t, \tilde{\omega}, y)} \dfrac{\int d\tilde{\omega}\ f_{\theta, \tilde{y}, y}(t, \tilde{\omega}, y)}{\int d(t, \tilde{\omega})\ f_{\theta, \tilde{y}, y}(t, \tilde{\omega}, y)}. $$ Therefore, there is a cancellation and when you multiply by $\tilde{\omega}$ and integrate, you reach the same expressions. By the way, we have to assume that $|\theta| + |\tilde{y}| + |y|$ is an integrable random variable. (The same result applies for random vectors but we have to assume that their $\mathbf{L}^1$ norms are integrable.)

3
On

$P((A \mid B) \mid C)$ does not make sense. But $E(E(X \mid Z, Y) \mid Y)$ makes sense. To parse it, first note that $E(X \mid Z, Y)$ is a random variable which is a function of $Z$ and $Y$. So we can write $E(X \mid Z, Y) = f(Z, Y)$. Now $f(Z, Y)$ is a random variable and $E(f(Z, Y) \mid Y)$ makes sense.

The equality $E(E(X \mid Z, Y) \mid Y) = E(X \mid Y)$ is a consequence of the abstract tower property. Some form of the tower property can be proven using conditional densities. You can prove this specific identity like this: \begin{align} E(X \mid Y = y) &= \int x f(x \mid y)\,dx \\ &= \int x \int f(z \mid y)f(x \mid y, z)\,dz\,dx \\ &= \int \int x f(x \mid y, z)\,dx f(z \mid y)\,dz \\ &= \int E(X \mid Y = y, Z = z) f(z \mid y)\,dz \\ &= E(E(X \mid Y = y, Z) \mid Y = y) \\ &= E(E(X \mid Y, Z) \mid Y = y). \end{align}