Deriving the Normal Equation used to solve Least Squares Problems

Question

Deriving the Normal Equation used to solve Least Squares Problems

3.3k Views Asked by Bumbble Comm At 17 May 2026 - 6:00

I am studying for my linear algebra final and it was suggested to me that I learn how to derive the normal equation used in solving least squares problems. I have been looking in various places online but I haven't had any luck, probably because I don't know the topic well enough to put the notation my professor uses into what I find. Note: the notation for what I'm using is directly from my class and I don't have a textbook to reference because he doesn't use one.

Here is an example from my class and I'm trying to use it to generate a general example that I could use to derive the normal equation $A^TA\vec x=A^T\vec b$

Given this equation $N=a_0t_0+a_1t_1$, and a graph with 4 collected data points in the form $(t_i,n_i)$, I created this error function

$E_2\begin{bmatrix}a_1\\a_2\end{bmatrix}= (a_0+a_1t_1-n_1)^2+(a_0+a_1t_2-n_2)^2+(a_0+a_1t_3-n_3)+(a_0+a_1t_4-n_4)=e_i^2+e_2^2+e_3^2+e_4^2=\Vert e\Vert= \Vert A\vec x-\vec b\Vert $

The summation I have is: $$\sum_{i=1}^4 e_i^2 = (a_0+a_1t_i-n_i)^2$$

Since this is for a specific case I know I need to find a general error function to work with in order to derive the normal equation, so I created this one:

$$E_2\begin{bmatrix}a_1\\a_2\\.\\.\\.\\a_n\end{bmatrix}=\sum_{i=1}^n e_i^2 = (a_0+a_1t_i+a_2t_i^2+...a_nt_i^ n-n_i)^2$$

I'm not even sure that I need this information to do my derivation. What do I need to do?

Original Q&A

There are 2 best solutions below

Bumbble Comm On 25 Feb 2017 - 10:24

@Claude Leibovici leads us to the normal equations using the calculus of minimization. Another approach based on linear algebra follows.

We are given the system matrix $A\in\mathcal{C}^{m\times n}$ and a data vector $b\in\mathcal{C}^{m}$. The goal is to find the solution vector $x\in\mathcal{C}^{n}$. If the vector $b$ is not in the image space of $A$, there is no direct solution to $Ax=b$. In other words, $b\in\mathcal{R}(A)\oplus\mathcal{N}(A^{*})$, has a null space component.

We can transform the problem to cure this malady. By multiplying both sides of the equation by $A^{*}$, we form the normal equations $$ A^{*} A x = A^{*}b. $$ The new data vector $\beta = A^{*}b\in\mathcal{C}^{n}$, and the new solution vector $\eta=A^{*}x\in\mathcal{C}^{m}$. The new system is $$ A^{*} \eta = \beta. $$

By this deliberate construction, the data vector $\beta$ is in the image of $A^{*}$. That is, we can express $\beta$ as a combination of the columns of $A^{*}$. The prescription is given in terms of $A^{*}_{k}, $the column vectors of $A^{*}$: $$ \beta = \eta_{1} A^{*}_{1} + \eta_{2} A^{*}_{2} + \cdots + \eta_{n} A^{*}_{n} $$

Now demonstrate that the normal equations solution is also the least squares solution. The idea is to show the normal equations solution minimizes the sum of the squares of the residuals given by $$ r^{2} = \min_{x\in\mathcal{C}^{n}}\lVert Ax - b \rVert_{2}^{2}. $$

Let $x_{*}$ be the solution to the normal equations: $$ x_{*} = \{x\in\mathcal{C}^{n} | A^{*}A x - A^{*} b = 0 \}. $$

The task is to evaluate the residuals and the trick is to fashion a useful zero vector inside the norm: $$ \begin{align} % \lVert A x - b \rVert_{2}^{2} % & = \lVert A x + \underbrace{Ax_{*} - Ax_{*}}_{0}- b \rVert_{2}^{2} \\ & = \lVert (A x_{*} - b) + A(x - x_{*}) \rVert_{2}^{2} \\ & = \lVert Ax_{*} - b \rVert_{2}^{2} + \lVert A(x - x_{*}) \rVert_{2}^{2} - 2(x - x_{*})^{*} \underbrace{(A^{*} A x_{*} - A^{*} b)}_{0} \\ & = \lVert Ax_{*} - b \rVert_{2}^{2} + \lVert A(x - x_{*}) \rVert_{2}^{2} \end{align} $$ The conclusion is the sum of the squares of the residuals are minimized when $x = x_{*}$. Therefore, the solution to the normal equations is also the least squares solution.

**Bumbble Comm** · Accepted Answer

Admitting that I properly understood the problem

Considering that you have $N$ data points $(t_i,n_i)$ you "almost" properly wrote the error function (I suppose that missing the second summation symbol is just a typo).

Let us call it $SSQ$ (standing for the sum of the squares) as $$SSQ=\sum_{i=1}^N e_i^2 =\sum_{i=1}^N (a_0+a_1t_i-n_i)^2$$ is tha quantity you want to minimize and, as usual, we shall use its derivatives with respect to the unknow parameters $(a_0,a_1)$ and set them equal to $0$.

So $$\frac{d\, SSQ}{da_0}=\sum_{i=1}^N 2(a_0+a_1t_i-n_i)=0$$ $$\frac{d\, SSQ}{da_1}=\sum_{i=1}^N 2t_i(a_0+a_1t_i-n_i)=0$$ Since the $2$ multiply every term, let us forget them and distribute the summations. $$\frac{d\, SSQ}{da_0}=0\implies a_0\sum_{i=1}^N 1+a_1\sum_{i=1}^N t_i-\sum_{i=1}^N n_i=0$$ $$\frac{d\, SSQ}{da_1}=0\implies a_0\sum_{i=1}^N t_i+a_1\sum_{i=1}^N t_i^2-\sum_{i=1}^N n_it_i=0$$

Putting the unknowns in the lhs and the remaining in the rhs, we then have $$a_0 N+a_1\sum_{i=1}^N t_i=\sum_{i=1}^N n_i$$ $$a_0 \sum_{i=1}^N t_i+a_1\sum_{i=1}^N t_i^2=\sum_{i=1}^N n_it_i$$ These are the normal equations; they are just linear and easy to solve for $a_0,a_1$.

Deriving the Normal Equation used to solve Least Squares Problems

There are 2 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in LEAST-SQUARES

Trending Questions

Popular # Hahtags

Popular Questions