Proper way to use projection matrix equation

351 Views Asked by At

Introduction:

This lecture explains how can linear algebraic methods be used to solve linear regression problems.

From my understanding, If there is no solution to $Ax=b$, we can project $b$ into the space and get $x$ which will be the closest solution. Although $A$ must be orthogonal for the error $e = b - Ax$ (or it should be in the null space of $A^T$, which is equivalent to orthogonality) in the approximation since then the inner product of $A$ and $e$ would give us the equation.

If we are working in the spaces of 3 dimensions or higher (or planes of 2 dimensions and higher), then our projection could be represented as:

$P = A(A^TA)^{-1}A^T$

Problem:

At this point of the lecture, Professor Strang presents linear function $y=C+Dt$ and its equation for each point in the graph, then proceeds to show the problem which is:

$Ax = b$

Somehow according to the graph, variables are substituted with following vectors:

$\begin{bmatrix}1 & 1 \\ 1 & 2 \\ 1 & 3 \end{bmatrix}\begin{bmatrix}C \\D\end{bmatrix} = \begin{bmatrix}1 \\ 2 \\ 3 \end{bmatrix}$

I don't understand representation of the matrix $A$, why is first column padded with ones? Is it specifically padded so it can work in projection equation above which expects $A$ to be in matrix form? If not then what does it represent?

Question:

Shortly, what is the proper way to use this projection matrix equation? ($P = A(A^TA)^{-1}A^T$)

Thank you.

1

There are 1 best solutions below

2
On BEST ANSWER

If you multiply the matrix and the vector $\begin{bmatrix} C\\D\end{bmatrix}$, you see, that the lines read \begin{align}C+D\cdot 1&=1 \\ C+D \cdot 2 &= 2 \\ C+D \cdot 3 &= 3. \end{align}

At this point, it should be pretty clear, where the $1$ in the frst collumn does come from.

In essence, you search for a linear combination of basis functions. The linear coefficients, here $C$ and $D$, are represented in the vector of unknowns. The Basis functions, that you try to combine, in this case the constant function $1$ and the linear function $t$, are in the matrix. Each collumn contains the evaluations of these basis function on the point, you have the data (in our case here the evaluations points are $t=1,2,3$). The constant function is always $1$, that's where the constants are coming from.