Confused regarding the interpretation of A in the least squares formula A^T A = A^T b

66 Views Asked by At

So I'm was watching gilbert strang's lecture to refresh my memory on least squares, and there's something that's confusing me (timestamp included).

In the 2D case he has $A=[1,1;1,2;1,3]$. In the lecture he talks about how A is a subspace in our n dimesional (in this case n=2) space.

If you look at the top left point of the chalk board in that time stamp you will see he has written down $A=[a_{1},a_{2}]$. Here he was talking about the 3d case, and the $a$s were vectors that spanned a plane onto which we wanted to project.

My problem is, I'm not quite sure how I'm supposed to understand the values for the 2D. Clearly the 2nd column is the x values, but can one say they span a space? The first one is obviously just the constant for our linear equation but how could one interpret that in the context of A spanning the subspace of our 2D world?

Basically I find there's a contradiction between how he views 2D and how he looks at higher dimensions. I don't see how it makes sense for A to be made out of two vectors as columns in 3D and for A to be made out of 2 different columns in 2D.

2

There are 2 best solutions below

3
On BEST ANSWER

Ignore the fact that the question arose from linear regression. Just think about the space spanned by the vectors $[1,1,1]^T$ and $[1,2,3]^T$. This is a 2D plane living inside 3D space. Solving the least squares form of $Ax=b$ amounts to finding the vector on this plane which is closest to $b$ and writing it as a linear combination of $[1,1,1]^T$ and $[1,2,3]^T$.

3
On

You have an $n\times 2$ matrix: $$ A = \begin{bmatrix} 1 & a_1 \\ 1 & a_2 \\ 1 & a_3 \\ \vdots & \vdots \\ 1 & a_n \end{bmatrix} $$ The two columns span a $2$-dimensional subspace of an $n$-dimensional space. Your scatterplot is $\{(a_i,b_i) : i = 1,\ldots, n\}$; it is a set of $n$ points in a $2$-dimensional space. The least-squares estimates $\hat x_1$ and $\hat x_2$ are those that minimize the sum of squares of residuals: $$ \sum_{i=1}^n (\hat x_1 + \hat x_2 a_i - b_i)^2. $$ The vector of "fitted values" has entries $\hat b_i = \hat x_1 + \hat x_2 a_i$ for $i=1,\ldots,n$.

The vector $\hat{\vec b}$ of fitted values is the orthogonal projection of $\vec b$ onto the column space of the matrix $A$.