Differentiation for least squares method?

2.3k Views Asked by At

Is there any reason that we use mathematical differentiation of least squares method for regression analysis? The theory say we use differentiation supposing the sum of errors is 0. I I don't really understand how differentiation can help in least squares method. Can somebody explain this?

1

There are 1 best solutions below

2
On

For illustration purposes, let us take the simplest case of a linear regression. You have a data set containing $N$ data points [$x(i)$,$y(i)$] and you search for the line $y = a+ b x$ which "best" represent your data.

Since we typically assume that there is no error on the $x$'s and that the errors on the $y$'s are normally distributed, one of the most classical objective function build for this kind of problem is the sum of squares (SSQ) of the errors on the $y$'s that is to say

$$SSQ(a,b) = \sum _{i=1}^N (a+b x(i) -y(i))^2$$ and, to make things as good as possible, we want that $SSQ$ be as small as possible. This last point means that we want to find the optimum values of unknown parameters $a$ and $b$ to be such that, at this point, $SSQ$ be the minimum. So, the problem is very similar to the problem of finding a minimum for a given function.

The minimum will be obtained when and only when the derivatives of $SSQ(a,b)$ with respect to $a$ and with respect to $b$ will be zero at the same point. These deivatives so lead to two linear equations in $a$ and $b$ (they are usually called the normal equations); from these equations are immediately extracted the optimum values of the parameters $a$ and $b$ which define the best regression line.

I hope you see better that, in this problem, differentiation is used just to express that we want to have a minimum sum of squared errors.

It is sure that we could have jus set the problem as "minimize $SSQ(a,b)$". But, in order to solve this problem as a regular optimization problem, we should need to build the Jacobian (may be the Hessian too) of the objective function. But, both the Jacobian and the Hessian require the derivatives of the objective functions with respect to the parameters.

I hope you see, that through this simple process, whatever could be the number of data points N, we end with as many equations as parameters in the model.

The same applies to nonlinear regression such as $y=a+b e^{-c x}$; the same technique will be used; for such a case, we shall end with three nonlinear equations which will then require iterations for reaching the solution.

I hope this clarifies your question. If this is not the case, please post.