Let $A=\{X=Y\}$ be the event that $X$ and $Y$ take the same value. If $X$ and $Y$ arediscrete, independent and identically distributed, then is it true that $E[X|A] =E[Y]$?
I know that it is false (for a counter-example, let $X$ and $Y$ be two independent indicator variables with $p_X(1) =p_Y(1) = 1/3$). But only because I've seen the explicit calculation for a counterexample.
How might I intuitively know that the statement is false?
Is the fact that $X$ and $Y$ are discrete important?
Compare and contrast: $$ \begin{align} E[X]&=\sum_{x\in \mathcal X} x\cdot p(x)\\ E[X|X=Y]&=\sum_{x\in \mathcal X} x\cdot \frac{p(x)^2}{P(X=Y)} \end{align} $$ We are summing over the same sets, and the $x$ is the same, but we have different weights. How do they compare? We see that $p(x)>p(x)^2/P(X=y)$ if and only if $p(x)<P(X=Y)$. That is, if $p(x)$ was smaller than a certain threshold, then the conditional weight is even smaller, while if $p(x)$ is above the threshold, the weight increases. Therefore, conditioning on $X=Y$ causes a bias away from the unlikely values of $X$, and towards the likely ones.
For example, if $X$ is geometric with success probability $1/2$, then its mode is at $x=1$, and the $p(x)$ decreases to zero as $x$ increases. Conditioned on $X=Y$, the conditional distribution would be skewed more towards the likely values near $x=1$, so $E[X|X=Y]$ is less than that of $E[X]$.
Discreteness is not actually necessary. If $X$ and $Y$ are continuous and i.i.d, then letting $Z=X-Y$, you can talk about the conditional distribution of $X$ given $Z=0$. You will find the same phenomenon: if $X$ has pdf $f(x)$, then the condition pdf given $Z=0$ will be $f(x)^2/f_Z(0)$, so likely regions of $X$ attain a bias.