Let's consider the 1D case. How do we prove that the error variance of the Best Linear Predictor (BLP) is inferior than the Proportional Predictor (i.e. the Linear Predictor without the intercept)?
For recall, the BLP of $Y$ given $X$ is: \begin{equation} BLP(X) = \alpha + \beta X \end{equation} with $\alpha = E(Y) - E(X) \beta$ and $\beta = Var(X)^{-1}Cov(X,Y)$
The Proportional Predictor of $Y$ given $X$ is: \begin{equation} PP(X) = \gamma X \end{equation} with $\gamma = \frac{E(XY)}{E(X^2)}$
I found that the error variance of the BLP is: $E(e_{BLP}^2) = Var(Y) - \frac{Cov(X,Y)^2}{Var(X)}$
and that the error variance of the Proportional Predictor is: $E(e_{PP}^2) = E(Y^2) - \frac{E(XY)^2}{E(X^2)}$.
I'm a bit stuck trying to show: $E(e_{BLP}^2) \leq E(e_{PP}^2)$, ie that
\begin{equation} Var(Y) - \frac{Cov(X,Y)^2}{Var(X)} \leq E(Y^2) - \frac{E(XY)^2}{E(X^2)} \end{equation}
Here is the proof.
Let's consider the difference $\Delta$: \begin{align*} \Delta &= E(e_{PP}^2) - E(e_{BLP}^2) \\&= E(Y^2) - \frac{E(XY)^2}{E(X^2)} - Var(Y) + \frac{Cov(X,Y)^2}{Var(X)} \\&= E(Y)^2 + \frac{Cov(X,Y)^2 E(X^2) - E(XY)^2 Var(X)}{E(X^2) Var(X)} \\&= E(Y)^2 + \frac{Cov(X,Y)^2 E(X^2) - E(XY)^2 E(X^2) + E(XY)^2 E(X)^2}{E(X^2) Var(X)} \\&= E(Y)^2 \\&+\frac{(E(XY)^2 + E(X)^2E(Y)^2 - 2E(XY)E(X)E(Y)) E(X^2) - E(XY)^2 E(X^2) + E(XY)^2 E(X)^2}{E(X^2) Var(X)} \\&= E(Y)^2 + \frac{(E(X)^2E(Y)^2-2E(XY)E(X)E(Y)) E(X^2) + E(XY)^2 E(X)^2}{E(X^2) Var(X)} \\&= \frac{E(Y)^2E(X^2)(Var(X)+E(X)^2)-2E(XY)E(X)E(Y)E(X^2) + E(XY)^2 E(X)^2}{E(X^2) Var(X)} \\&= \frac{E(Y)^2E(X^2)^2-2E(XY)E(X)E(Y)E(X^2) + E(XY)^2 E(X)^2}{E(X^2) Var(X)} \\&= \frac{(E(Y)E(X^2)-E(XY)E(X))^2}{E(X^2)Var(X)} \geq 0 \end{align*} So we have proved that $\Delta \geq 0$, i.e. $E(e^2) \geq E(e_{BLP}^2)$, the proportional predictor's MSE is greater than the linear predictor.