The Bayes predictor of the square loss is $\Bbb E_P[Y\mid X=x]$?

82 Views Asked by At

Let $(X,Y) \in \Bbb X \times \Bbb Y$ be jointly distributed according to distribution $P$. Let $h: \Bbb X \rightarrow \tilde {\Bbb Y}$, where $\tilde {\Bbb Y}$ is a predicted output. $ $Let $L(h,P) \equiv \Bbb E_P[l(Y, h(X))]$ where $l$ is some loss function.

Show that $f = \arg \min_h L(h,P) = \Bbb E_p[Y \mid X = x]$ if $l$ is the square loss function: $l(Y, h(X)) = (y - h(x))^2$

I figured I show this by showing any other $h$ leads to a larger $L(h,P)$ than $\Bbb E_P[Y\mid X=x]$.

I start with $$\Bbb E_P[(y - \Bbb E_p[Y\mid X=x])^2] \le \Bbb E_P[(y - h(x))^2]$$

Then expanding we have:

$$\Bbb E_P[y^2-2y\Bbb E_p[Y|X=x] + \Bbb E_P[Y\mid X=x]^2] \le \Bbb E_P[y^2 - 2yh(x) + h(x)^2]$$

And simplifying:

$$-2\Bbb E_P[y]\Bbb E_p[Y\mid X=x] + \Bbb E_P[Y\mid X=x]^2 \le -2\Bbb E_P[yh(x)] + \Bbb E_P[h(x)^2]$$

But from here I'm a little stuck as to how to continue.

Does anyone have any ideas?