There is a standard textbook example where one is asked to compute the conditional expectation $E[X|Y]$ where $X$ is the number of heads in, say, 10 coin tosses, and $Y$ is the number of heads in the first 4 of these 10 tosses (let's assume a fair coin). This is fairly straightforward to calculate.
However, I am getting surprisingly confused when trying to derive $E[Y|X]$ in the same setting. Obviously, for any $k=0,1,\dots,10$ by definition we have $E[Y|X=k]=\frac{E[(Y\mathbb{1}_{\{X=k\}}]}{\mathbb{P}(X=k)}$ and the denominator is very easy to compute, but what about the numerator? I know this is probably still simple, but somehow I can't get it right (this kind of 'backward' conditioning - expectation of number of heads in the first 4 tosses if there were $k$ heads altogether - has always caused me some pain in practical examples). Any help would be much appreciated :)
Do not do that.
I mean, sure, it is correct; it is just too much work.
Mathematicians are lazy; they always look to solve things with the least effort.
You should too.
The conditional distribution of heads in the first four tosses, when given $k$ among $10$ tosses are heads, will be hypergeometrically distributed. You are taking a sample of $4$ among a population of $10$ containing $k$ favored, without replacement or bias.
The mean for a $\mathcal{Hypergeom}(10,k,4)$ distribution is $4k/10$.
Alternatively:
When given that $k$ among the $10$ tosses are heads, the expectation that any particular toss is a head is $k/10$, so by Linearity of Expectation, the expectated count of heads among the first four tosses, under this condition, is $4k/10$.
Thus $\Bbb E(Y\mid X=k)= 4k/10$