Why must the squared error function be at its minimum?

809 Views Asked by Bumbble Comm At 26 Mar 2026 - 11:05

In this Khan Academy video series Khan goes through the derivation of the formula for the linear regression line for some data points.

The only part I do not understand is the one I've given a link to. Particularly, I don't understand why Khan is so sure that when he sets the partial derivatives to zero, he is going to get the squared error function at its minimum (as opposed to its maximum).

How does he know that? He doesn't explain this in the video, so I believe it must be more or less obvious.

A short answer explaining this in simple terms would be much appreciated.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 05 Jan 2018 - 9:10 BEST ANSWER

That the critical point correspond to a mimimum squared error is a very intutively idea but it is not a simple fact to be proved in general.

In the simplest case the crucial fact is that we are dealing with the mimimization of the function of two variables:

$$e=f(m,b)=\sum (mx_i+b-y_i)^2$$

and at the critical point ($\nabla f=0$) we should also verify that the determinant of the Hessian matrix is positive

$$\begin{vmatrix} f_{mm}&f_{mb}\\f_{mb}&f_{bb} \end{vmatrix}=f_{mm}f_{bb}-(f_{mb})^2>0$$

It can be done by induction or by Cauchy-Schwartz inequality.

Here is a nice derivation A Quick Proof that the Least Squares Formulas Give a Local Minimum

Why must the squared error function be at its minimum?

There are 1 best solutions below

Related Questions in PROOF-EXPLANATION

Related Questions in REGRESSION

Related Questions in MOTIVATION

Trending Questions

Popular # Hahtags

Popular Questions