Is minimising of linear regression lines different from minimising polynomials?

31 Views Asked by At

Given a set of data points,

$$(x_1, y_1), (x_2, y_2), (x_3, y_3), ... , (x_n, y_n)$$

that we are trying to fit to the straight line,

$$\hat{y} = \hat{a} + \hat{b}x$$

the sum of the squares of the errors $\hat{y} - y$ is

$$ S = (\hat{a} + \hat{b}x_1 - y_1)^2 + (\hat{a} + \hat{b}x_2 - y_2)^2 + ... + (\hat{a} + \hat{b}x_n - y_n)^2 $$

To minimize, \begin{cases} \dfrac{\partial S}{\partial \hat{a}} = 2(\hat{a} + \hat{b}x_1 - y_1) + 2(\hat{a} + \hat{b}x_2 - y_2) + ... + (\hat{a} + \hat{b}x_n - y_n) = 0\\ \\ \dfrac{\partial S}{\partial \hat{b}} = 2x_1(\hat{a} + \hat{b}x_1 - y_1) + 2x_2(\hat{a} + \hat{b}x_2 - y_2) + ... + 2x_n(\hat{a} + \hat{b}x_n - y_n) = 0\\ \end{cases}

Why do we assume that $\dfrac{\partial S}{\partial \hat{a}}=0$ will find the value of $\hat{a}$ that minimises $S$ ? (and likewise for $\dfrac{\partial S}{\partial \hat{b}}$)

I'm used to testing whether the value is in fact a minimum value (ie. if $\dfrac{\partial^2 S}{\partial \hat{a}^2} > 0$), but the textbook gives no mention of this.

Is minimization in linear regression different from non-linear optimization problems (and somehow doesn't need a maxima/minima check) ? A corollary would be, why doesn't the above find the maximum values of $\hat{a}$ and $ \hat{b}$ ?

TIA

1

There are 1 best solutions below

1
On BEST ANSWER

Yes, you can definitely look at the second derivatives to check that it is, indeed, a minimum, although more formally you need to look at the Hessian matrix to confirm that it's not a saddle point, and you can confirm that everything looks all right that way.

Alternatively, and I admit I'm going to go a little hand-wavey here, you can use the fact that $S$ is a sum of squared terms and that it must thus be bounded from below (at the very least, $S \geq 0$), and since it has a unique critical point (there's only one solution to $\frac{\partial S}{\partial \hat{a}} = \frac{\partial S}{\partial \hat{b}} = 0$) and $S$ is differentiable everywhere (so it has no weird discontinuities) that means that the critical point must be a minimum.