why with least squares I get a minimum?

1.4k Views Asked by At

I was reading about least squares method and every book I read just said that we can get the minimum value solving a equations system. For example. If I have $$ Q=\sum(Y_i-\beta_0-\beta_1X_i)^2 $$ then solving this $$ \frac{\partial Q}{\partial \beta_0}=0 $$ $$ \frac{\partial Q}{\partial \beta_1}=0 $$ We get a minimum value. But my question is how I know that the solution is a minimum and not a maximum nor a saddle point?

2

There are 2 best solutions below

3
On

Once you solve that system of equations you get a critical point. Indeed to verify that you get a minimum value we can do the Hessian matrix test. But intuitively, after seeing that the determinant of the Hessian is positive, we want $Q_{\beta_0 \beta_0}$ and $Q_{\beta_1 \beta_1}$ to be both positive at our point. This means that at our critical point, no matter what direction we go in, the graph is concave up, so this should mean we have a minimum value. Calculating this gives us $Q_{\beta_0 \beta_0} = \sum 2$ and $Q_{\beta_1 \beta_1}$ gives us $\sum 2X_i^2$ which are both positive.

0
On

We know it's a maximum because each term is a positive parabola For example, the equation $y = x^2$ is a positive parabola and has a minimum. It's 2nd derivative is positive, indicating that it is concave up everywhere.

$y'' = 2$

So anywhere on the curve of $y = x^2$ where $y'=0$ is a minimum.

In the case of

$$Q=\sum(Y_i-\beta_0-\beta_1X_i)^2$$

This is a positive paraboloid, because the $\beta_0$ and $\beta_1$ terms have positive coefficients. It's the same concept, but in 2 dimensions. $$Q=\sum(Y_i-(\beta_0+\beta_1X_i))^2$$ $$Q=\sum(Y_i^2-2Y_i\beta_0-2Y_iB_1X_i+B_0^2+2B_0B_1X_i+B_1^2)$$ This polynomial has 2 dimensions. The independent variables $B_0$ and $B_1$ have their highest term as a 2nd order polynomial, $1B_0^2$ and $(X_i^2)B_1^2$. Since the coefficients 1 and $X_i$ are positive, the surface is a positive parabola in both the $B_0$ dimension and $B_1$ dimension. You could take the 2nd partial derivative of this with respect to $B_0$ and get 1, or with respect to $B_1$ and get $(X_i^2)$. Since both of these are positive, any points where the partial derivative is 0 should be a minimum and not a maximum.