Intuition behind the expectation of a conditional expectation

Question

Intuition behind the expectation of a conditional expectation

501 Views Asked by Bumbble Comm At 11 May 2026 - 7:21

Let $(\Omega, \cal F, P)$ be a probability space and $X,Y$ two random variables defined on $(\Omega, \cal F, P)$. I stumbled upon the equality $$E[E(X \vert Y)]=E(X)$$

Observing that

$$\begin{align} E[E(X \vert Y)] & =\sum_iP(Y=Y_i)E(X\vert Y_i) \\ & = P(Y=Y_1)E(X \vert Y_1)+...+ P(Y=Y_n)E(X\vert Y_n) \\ &=E(X) \end{align}$$

it "mathematically" makes sense to me, but still not intuitively. How does one intuitively recognise that the expectation of a random variable equal the expectation of an expectation of that random variable given some information?

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2018-10-04 09:47:30

There are many ways of explaining this intuivitely, I will use the equalities you provided, stick to the discrete case and won't go much into the machinery of probability theory.

I claim that the sum $$\sum_i P(Y=Y_i)E(X|Y_i)$$ is essentially just the expected value of $X$ - a weighted average of all values of $X$ - in disguise. It is only matter of how the space $\Omega$ is partitioned.

$E(X)$

To see that, let's see what $E(X)$ does:

$$E(X) = \sum_i P(X = X_i) X_i$$

Think of the event $[X=X_i]$ as a set of all cases (all the $\omega$s) where $X(\omega) = X_i$. These events form a partitioning of $\Omega$ in that they are disjoint sets that add up to $\Omega$.

In $E(X)$ we are therefore looking at the value of $X$ on the set $[X=X_i]$ and weighting it with the size of the set, $P(X=X_i)$.

$E(E(X|Y))$

Now in your case: $$\sum_i P(Y=Y_i)E(X|Y_i)$$

Here, we are looking at the value of $X$ on the set $[Y=Y_i]$. But we can't just write $X_i$, since $X$ does not have to have the same value on all $\omega \in [Y=Y_i]$. Thus we have to average over all the cases in $[Y=Y_i]$, which is precisely what $E(X|Y_i)$ is. Then we just multiply this with the size of the set, which is $P(Y=Y_i)$.

In both cases, we are just aggregating the $\omega$s in different ways. In $E(X)$, it's over the sets $[X=X_i]$, in $E(E(X|Y))$, it's over the sets $[Y=Y_i]$.

**user65203** · Answer 2 · 2018-10-04 10:06:57

$E(X)$ is the expectation of $X$, whatever $Y$. You can also express it as a weighted average of the expectations of $X$ for given $Y$, where the weights are the probabilities of the respective values of $Y$.

E.g. we have the drawings

$$(1,1),(1,2),(2,1),(2,2)$$ with respective probabilities $0.1,0.2,0.3,0.4$.

The expectation of $X$ is

$$1\cdot0.1+1\cdot0.2+2\cdot0.3+2\cdot0.4=1.7$$

The marginal probabilities of $Y$ are $0.4, 0.6$.

The conditional probabilities of $X$ are

$$Y=1\to 0.25,0.75,\\Y=2\to 0.33,0.67.$$

The conditional expectations of $X$ are

$$Y=1\to 1\cdot0.25+2\cdot0.75=1.75,\\Y=2\to 1\cdot0.33+2\cdot0.67=1.67.$$

and indeed,

$$1.75\cdot0.4+1.67\cdot0.6=1.7.$$

Intuition behind the expectation of a conditional expectation

There are 2 best solutions below

$E(X)$

$E(E(X|Y))$

Related Questions in PROBABILITY-THEORY

Related Questions in CONDITIONAL-EXPECTATION

Related Questions in EXPECTED-VALUE

Trending Questions

Popular # Hahtags

Popular Questions