Intuition behind the fact that $X,Y$ i.i.d $\not \implies \mathbb E[X|A] = \mathbb E[Y]$

Question

Intuition behind the fact that $X,Y$ i.i.d $\not \implies \mathbb E[X|A] = \mathbb E[Y]$

194 Views Asked by Bumbble Comm At 21 Apr 2026 - 12:58

Let $A=\{X=Y\}$ be the event that $X$ and $Y$ take the same value. If $X$ and $Y$ arediscrete, independent and identically distributed, then is it true that $E[X|A] =E[Y]$?

I know that it is false (for a counter-example, let $X$ and $Y$ be two independent indicator variables with $p_X(1) =p_Y(1) = 1/3$). But only because I've seen the explicit calculation for a counterexample.

How might I intuitively know that the statement is false?
Is the fact that $X$ and $Y$ are discrete important?

Original Q&A

There are 4 best solutions below

Bumbble Comm On 22 Mar 2019 - 4:40

The key in the example is that $X,Y$ are more likely to hit the number 0 than 1. So $X$ has a better chance of equaling $Y$ when $X=0$ than when $X=1$. Hence, conditioning on $X=Y$ will "bias" $X$ toward being zero, and thus shifts the expected value.

It is perhaps instructive to look at the conditional distribution of $X$ given $A$: we have $P(X=0 \mid A) = 4/5$ and $P(X=1 \mid A) = 1/5$. You can see that this is more tilted toward 0 than the unconditional distribution of $X$ was, and so the conditional expectation is closer to 0 than the unconditional expectation is.

So that's why you should not expect the implication to hold: conditioning on $A$ may bias $X$ toward its more likely values.

The only real significance of $X,Y$ being discrete here is that it ensures that $P(A) > 0$. You could get the same phenomenon if $X,Y$ had a mixed distribution. However, if $X,Y$ have a continuous distribution then $P(A) = 0$ and you cannot condition on that event.

Bumbble Comm On 22 Mar 2019 - 4:49

Intuitively, if there is a bias in the distribution towards some outlying values, then that bias will be reinforced under the condition that the two variables have the same value. Thus gravitating the conditional mean towards the bias.

It is simply easier to seek a counter example in distributions where the event $\{X=Y\}$ has a non zero probability measure. That means discrete distributions.

For discrete random variables, $X,Y\overset{iid}\sim P$, then $\mathsf E(X\mid X=Y) = \dfrac{\sum_s s P^2(s) }{\sum_t P^2(t) }$ and $\mathsf E(Y)=\sum_s sP(s)$.

So a good place to look for a counter example would be somewhere $\exists s~:~P(s)\neq\sum_t P^2(t)$ . That is to say, not a uniform distribution. A simple case would be a Bernoulii distribution with a bias success rate, , so look at $X,Y\overset{iid}\sim \mathcal{Bern}(p)$ where $p=1/3$ (a bias towards $0$)

$$\mathsf E(X\mid X=Y)=\dfrac{p^2}{(1-p)^2+p^2}=\dfrac{1}{5}\neq \mathsf E(Y)=\dfrac 13$$

Bumbble Comm On 22 Mar 2019 - 5:19

For intuition: Let $X$ denote tomorrow's temperature in place $X$ and $Y$ denote tomorrow's temperature in place $Y$. Assume that $X$, $Y$ are independent, equally distributed and both can take value "hot" with very high probability and value "cold" with very low probability. So, your assumptions apply (discrete and i.i.d). Assume now that you learn that $X=Y$, i.e., that the weather in both places is the same. Sit back and think: what is more likely now? that it is hot in both places or cold in both places? Hot right? Because, cold means that two (not only one but two) very unlikely events occured simultaneously. Nah, not so likely. So, this changes your expectation for $X$.

**Bumbble Comm** · Accepted Answer

Compare and contrast: $$ \begin{align} E[X]&=\sum_{x\in \mathcal X} x\cdot p(x)\\ E[X|X=Y]&=\sum_{x\in \mathcal X} x\cdot \frac{p(x)^2}{P(X=Y)} \end{align} $$ We are summing over the same sets, and the $x$ is the same, but we have different weights. How do they compare? We see that $p(x)>p(x)^2/P(X=y)$ if and only if $p(x)<P(X=Y)$. That is, if $p(x)$ was smaller than a certain threshold, then the conditional weight is even smaller, while if $p(x)$ is above the threshold, the weight increases. Therefore, conditioning on $X=Y$ causes a bias away from the unlikely values of $X$, and towards the likely ones.

For example, if $X$ is geometric with success probability $1/2$, then its mode is at $x=1$, and the $p(x)$ decreases to zero as $x$ increases. Conditioned on $X=Y$, the conditional distribution would be skewed more towards the likely values near $x=1$, so $E[X|X=Y]$ is less than that of $E[X]$.

Discreteness is not actually necessary. If $X$ and $Y$ are continuous and i.i.d, then letting $Z=X-Y$, you can talk about the conditional distribution of $X$ given $Z=0$. You will find the same phenomenon: if $X$ has pdf $f(x)$, then the condition pdf given $Z=0$ will be $f(x)^2/f_Z(0)$, so likely regions of $X$ attain a bias.

Intuition behind the fact that $X,Y$ i.i.d $\not \implies \mathbb E[X|A] = \mathbb E[Y]$

There are 4 best solutions below

Related Questions in PROBABILITY

Related Questions in PROBABILITY-THEORY

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in RANDOM-VARIABLES

Trending Questions

Popular # Hahtags

Popular Questions