I need to show that the expectation $\mathbb{E}_{X,Y}[(Y-h(X))^2]$ can be written in the form $\mathbb{E}_X[\dots] + \mathbb{E}_{X,Y}[\dots]$, but I am unsure how to proceed.
$X$ and $Y$ are jointly distributed r.v.'s, and $h(x)$ is an arbitrary function taking $X$ as input and returning a scalar. ("$\dots$" represents expressions that are not specified here but need to be determined).
Namely, I need to show: $$ \mathbb{E}_{X,Y}[(Y-h(X))^2] = \mathbb{E}_X[(E[Y|X] - h(X))^2] + \mathbb{E}_{X,Y}[(Y-E[Y|X])^2] $$
How might I proceed in such a proof dealing with these expectation expressions?
One trivial way is $E_X[0] + E_{X,Y}[(Y-h(X))^2]$. But presumably they want you to expand the square $(Y-h(X))^2$ and put any terms that do not involve $Y$ into the first expectation, and put the rest in the second expectation.\begin{align} &E[(Y-h(X))^2]\\ &= E[(Y-E[Y \mid X] + E[Y \mid X] - h(X))^2]\\ &= E[(Y-E[Y \mid X])^2] + E[(E[Y \mid X]-h(X))^2] + 2 E[(Y-E[Y \mid X])(E[Y \mid X]-h(X))]. \end{align} It remains to show the third term is zero. \begin{align} &E[(Y-E[Y \mid X])(E[Y \mid X]-h(X))]\\ &= E\Big[E\big[(Y-E[Y \mid X])(E[Y \mid X]-h(X)) \mid X\big]\Big] & \text{tower rule}\\ &= E\Big[(E[Y \mid X]-h(X))E\big[(Y-E[Y \mid X]) \mid X\big]\Big] & \text{$E[Y \mid X]-h(X)$ are "constants" given $X$}\\ &= E\Big[(E[Y \mid X]-h(X))\underbrace{(E[Y \mid X] - E[Y \mid X])}_{=0}\Big] & \text{linearity, tower rule}\\ &=0. \end{align}
Remark: This is a "Pythagorean theorem." Essentially, $E[Y \mid X]$ is the "orthogonal projection" of $Y$ onto the space of $X$-measurable functions. Also see the law of total variance.