Properties of best linear predictor?

303 Views Asked by At

Conside two scalar random variables, $Y,X$. The best linear predictor of $Y\mid X$ under square loss function is $\theta_0=\operatorname{argmin}_{\theta} \mathbb{E}(Y-X\theta)^2=(\mathbb{E}(X^2))^{-1}\mathbb{E}(XY)$. Let $\epsilon_{\theta_0}:=Y-X\theta_0$. While it is clear that $\mathbb{E}(X\epsilon_{\theta_0})=0$, I can't see why $\mathbb{E}(\epsilon_{\theta_0})=0$. Any idea?