how is the formula for linear regression connected to the fact that the conditional mean is the optimal estimator?

24 Views Asked by At

I understand that given some explanatory variables $X_i$, the best prediction in the sense of minimizing the least-squares expected error for the dependent variable Y is the conditional mean.

But I don't understand how the formula $\beta_0 + \beta_iX_i$ is derived from that.

Reading some blogs and scripts, I am not quite sure what the correct way of looking at this is:

  • We just assume that we can estimate the cond. mean using a linear function (because they are easy) and hey - look at that - under certain assumptions it turns out to be true

or

  • No, no, it must be a sum of all the $X_i$, weighted by $\beta_i$, because the math says so (which means I don't understand the derivation)

I heard in a video someone say, that general linear models (of which linear regression is an instance), assume that the explaining effects just add up.

This makes it sound like the first way of looking at it is the better way.

Some clarification on this is highly appreciated!

1

There are 1 best solutions below

4
On BEST ANSWER

It’s not the linear function That minimizes the MSE — it’s how you estimate it’s parameters. If you are trying to minimize the MSE of a linear estimator then there are nice linear algebra formulas called normal equations that save you from having to numerically optimize over the $\beta_i$.

However, if you are in nonlinear territory then MSE may be just had hard to optimize.