Is there a nice way to interpret this matrix equation that comes up in the context of least squares

89 Views Asked by At

So I am working on this problem with fitting a second degree polynomial of the form $y=a_1x^2+a_2x+a_3$ to four points using least squares. One of the parts of the problem is to write out the matrix equation that describes the least squares problem. Basically we have the equation

$$\begin{bmatrix} x_1^2 & x_1 & 1 \\ x_2^2 & x_2 & 1 \\ x_3^2 & x_3 & 1 \\ x_4^2 & x_4 & 1 \end{bmatrix} \begin{bmatrix} a_1 \\ a_2 \\ a_3 \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix}$$

Let's call the matrix $A$ so that the problem is $A\mathbf{a}=\mathbf{y}$. Then the least squares equation is $A^TA\mathbf{a}=A^T\mathbf{y}$. If we write that out explicitly we get

$$\begin{bmatrix} \sum_{i=1}^4x_i^4 & \sum_{i=1}^4x_i^3 & \sum_{i=1}^4x_i^2 \\ \sum_{i=1}^4x_i^3 & \sum_{i=1}^4x_i^2 & \sum_{i=1}^4x_i \\ \sum_{i=1}^4x_i^2 & \sum_{i=1}^4x_i & 4 \end{bmatrix} \begin{bmatrix} a_1 \\ a_2 \\ a_3 \end{bmatrix} = \begin{bmatrix} \sum_{i=1}^4x_i^2y_i \\ \sum_{i=1}^4x_iy_i \\ \sum_{i=1}^4y_i \end{bmatrix} $$

This new matrix makes no intuitive sense to me whatsoever but is striking. Why do we care about, for instance, the sum of the 4th powers of $x_i$? Is there a way to interpret why this is the way it is, maybe in terms of calculus or something else?

1

There are 1 best solutions below

1
On

The normal equations for the least squares problem $Ax = b$ is given by $A^T Ax = A^T b$, as you clearly already know. It's not really the fourth powers of $x_i$ that are significant, but rather the matrix $A^T A$ that is significant, from a purely linear algebra point of view. One way to look at the matrix $A^T A$ is that it reduces the overdetermined problem $Ax = b$ (notice that you have three variables but four equations) down to an invertible problem, by taking each output $Ax$ and then dotting it by the columns of $A$, giving you precisely $A^T Ax$. Ultimately the least squares problem can be phrased as:

Given $Ax = b$, find $\hat{b}$ in the column space of $A$ closest to $b$, and then find $\hat{x}$ such that $A\hat{x} = \hat{b}$. We call $\hat{x}$ a least squares solution.

There is, however, a calculus way of looking at it. From an optimization point of view, we can say that $\hat{x}$ is a least squares solution if $\|Ax - b\|_2^2 = (Ax - b)^T (Ax - b)$ is minimized. We can calculate when this is minimized by taking the derivative in $x$. Let $f(x) = \|Ax - b\|_2^2$. Then \begin{align} Df_x(y) & = \lim_{h \rightarrow 0} \frac{1}{h} (f(x+hy) - f(x)) \\ & = \lim_{h \rightarrow 0} \frac{1}{h} \left( \|Ax - b\|_2^2 + h(Ay)^T (Ax - b) + h(Ax - b)^T Ay - \|Ax - b\|_2^2 \right) \\ & = (Ay)^T (Ax - b) + (Ax - b)^T Ay \\ & = y^T (A^T Ax - A^T b) + (A^T Ax - A^T b)^T y \end{align} We set this derivative to zero to solve for the minimum; this can only be zero for every choice of $y$ if $A^T Ax - A^T b = 0$. Thus we have derived the normal equations.