I am trying to understand the following.
We have two jointly distributed discrete random variables $X$ and $Y$. We are trying to use $X$ to predict $Y$. Specifically, let us use decide some function $h(X)$ that we can use to predict $Y$, such that $h(X)$ is optimal. That is, $h(X)$ minimizes the $\text{MSE}=E\{[Y-h(X)]^2\}$.
Looking at this expectation, I believe to express the expectation as a summation we would have
$$\sum_{x,y} p_{X,Y}(x,y)(y-h(x))^2,$$
where $p_{X,Y}$ is the joint pmf of $X$ and $Y$.
But here is the next step, we have (from the law of total expectation):
$$E\{[Y-h(X)]^2\}=E(E\{[Y-h(X)]^2\mid X\})$$ where we sum the outer expectation with respect to $X$. In realizing that the inner expectation is minimized by setting $h(x)$ equal to $E(Y\mid X=x)$ we realize we can minimize the $\text{MSE}$.
My question is, would it be true that $E(E\{[Y-h(X)]^2\mid X\}) $is equal to the following double sum?
$$\sum_x \sum_y p_{Y\mid X}(y\mid x)p_X(x)(y-h(x))^2.$$
I am trying to expand out the expectation to see mathematically what is going on here. I understand the intuition behind the idea of having $h(x)$ equal to $E(Y\mid X=x)$.
You can directly get $$\sum_x \sum_y p_{X,Y}(x,y) (y-h(x))^2 = \sum_x p_X(x) \sum_y p_{Y \mid X}(y \mid x) (y - h(x))^2.$$