Expected error of best possible linear fit?

145 Views Asked by At

I asked the following question on stat SE, but I could not get a mathematically rigorous answer, and I have decided to ask here again.

In my textbook, there is a statement mentioned on the topic of linear regression/machine learning, without a proof or rigorous justification, which is simply quoted as,

Consider a noisy target, $ y = (w^{*})^T \textbf{x} + \epsilon $, for generating the data, where $\epsilon$ is a noise term with zero mean and $\sigma^2$ variance, independently generated for every example $(\textbf{x},y)$. The expected error of the best possible linear fit to this target is thus $\sigma^2$.

The word "The expected error of the best possible linear fit to this target" confuses me. Is the expectation operator over random variables $x$ and $y$ ? If so, is $\epsilon$ a random variable, $\epsilon(x,y)$, which is a function two random variables ? Can anyone show me some mathematical proof or explanation. How do we know what best linear fit is ? How do we calculate that the error is $\sigma^2$.

1

There are 1 best solutions below

0
On

I am not sure, but I gues what the author means with best linear fit is that you know $\omega$ and use this parameter for your fit, that is, your fit is $\hat{y} = \omega^Tx$. In that case we get that the expected error (which is usually calculated by taking the squared difference) is $$ E((y - \hat{y})^2) = E((\omega^Tx+\epsilon - \omega^Tx)^2) = E(\epsilon^2) = \sigma^2. $$