Let $(X,Y) \in \Bbb X \times \Bbb Y$ be jointly distributed according to distribution $P$. Let $h: \Bbb X \rightarrow \tilde {\Bbb Y}$, where $\tilde {\Bbb Y}$ is a predicted output. $ $Let $L(h,P) \equiv \Bbb E_P[l(Y, h(X))]$ where $l$ is some loss function.
Show that $f = \arg \min_h L(h,P) = \Bbb E_p[Y \mid X = x]$ if $l$ is the square loss function: $l(Y, h(X)) = (y - h(x))^2$
I figured I show this by showing any other $h$ leads to a larger $L(h,P)$ than $\Bbb E_P[Y\mid X=x]$.
I start with $$\Bbb E_P[(y - \Bbb E_p[Y\mid X=x])^2] \le \Bbb E_P[(y - h(x))^2]$$
Then expanding we have:
$$\Bbb E_P[y^2-2y\Bbb E_p[Y|X=x] + \Bbb E_P[Y\mid X=x]^2] \le \Bbb E_P[y^2 - 2yh(x) + h(x)^2]$$
And simplifying:
$$-2\Bbb E_P[y]\Bbb E_p[Y\mid X=x] + \Bbb E_P[Y\mid X=x]^2 \le -2\Bbb E_P[yh(x)] + \Bbb E_P[h(x)^2]$$
But from here I'm a little stuck as to how to continue.
Does anyone have any ideas?