Understanding the formula $P_A = A(A^TA)^{-1}A^T$ for projection

Question

Understanding the formula $P_A = A(A^TA)^{-1}A^T$ for projection

3.5k Views Asked by Bumbble Comm At 27 Mar 2026 - 7:20

In this section: https://en.wikipedia.org/wiki/Projection_(linear_algebra)#Formulas of projections, it presents some formulas for the projection onto subspaces, I believe. I only know how to do the projection of a vector into a line.

If u is a unit vector on the line, then the projection is given by the outer product

$$P_u = uu^T$$

The formula I knew was $$Proj_a(v)=\frac{a\cdot v}{||v||^2}v$$

How is this related to $P_u$ and how to arrive at the formulas

$$P_A = A(A^TA)^{-1}A^T$$ and $$P_A = A(A^TDA)^{-1}A^TD$$?

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2018-11-24 08:34:19

The formal proof cited by Omnomnomnom in his comment uses the idea that the orthogonal projection of a vector onto a subspace is the element of that subspace that is nearest the vector. However, you can also view these matrix formulas as successive generalizations of the simpler vector formulas that you know.

Let’s back up a bit first. The projection of $\mathbf v$ onto a unit vector $\mathbf u$ is $(\mathbf v\cdot\mathbf u)\mathbf u$. By treating a $1\times1$ matrix as a scalar, this can be written as the matrix product $(\mathbf u^T \mathbf v)\mathbf u$. Strictly speaking, usually only left-multiplication of vectors by scalars is defined for a vector space, but we blithely ignore this and move scalars around at will, so we can further rearrange the previous expression as $(\mathbf u \mathbf u^T)\mathbf v$, giving the first of your projection matrix formulas. Every column of the matrix $\mathbf u\mathbf u^T$ is a multiple of $\mathbf u$, so it should be clear that the image of this map is spanned by $\mathbf u$.

Now suppose that we have an orthonormal basis $\{\mathbf u_1,\dots,\mathbf u_m\}$ for some subspace. The orthogonal projection of $\mathbf v$ onto this space is simply the sum of the individual projections onto the basis vectors: $$\sum_{k=1}^m (\mathbf u_k^T \mathbf v)\mathbf u_k = \sum_{k=1}^m \mathbf u_k (\mathbf u_k^T \mathbf v) = \begin{bmatrix}\mathbf u_1 \cdots \mathbf u_m\end{bmatrix} \begin{bmatrix}\mathbf u_1^T \mathbf v \\ \vdots \\ \mathbf u_m^T \mathbf v\end{bmatrix} = \begin{bmatrix}\mathbf u_1 \cdots \mathbf u_m\end{bmatrix} \begin{bmatrix}\mathbf u_1^T \\ \vdots \\ \mathbf u_m^T\end{bmatrix} \mathbf v.$$ Calling the matrix that has these basis vectors for its columns $U$, this says that the orthogonal projection matrix is $UU^T$.

Another way to interpret the first sum above is that, if we extend this basis to a complete basis of the ambient space, the $k$th coordinate of $\mathbf v$ is the dot product $\mathbf u_k^T\mathbf v$. In this coordinate system, the projection onto the subspace is then obtained by setting all of the coordinates after the $m$th to zero. Multiplying by $U^T$ computes these dot products in bulk, producing the vector of coordinates of $\mathbf v$’s projection relative to this basis, and multiplying by $U$ converts these to the standard basis.

Going back to projection onto a vector, if we want to project onto an arbitrary nonzero vector $\mathbf w$, we can apply the first formula by the simple expedient of normalizing $\mathbf w$: $$\left(\mathbf v\cdot {\mathbf w \over \|\mathbf w\|}\right) {\mathbf w \over \|\mathbf w\|} = {\mathbf v\cdot\mathbf w \over \mathbf w\cdot\mathbf w}\mathbf w = \left({\mathbf w \mathbf w^T \over \mathbf w^T\mathbf w}\right) \mathbf v.$$ The numerator of the parenthesized expression is a square matrix as before, while the denominator is a scalar—the square of $\mathbf w$’s norm. Similarly, if we have an orthogonal basis $\{\mathbf w_1,\dots,\mathbf w_m\}$ for a subspace, we can add up the individual projections: $$\sum_{k=1}^m \mathbf w_k {1\over\mathbf w_k^T\mathbf w_k}(\mathbf w_k^T\mathbf v) = \begin{bmatrix}\mathbf w_1&\cdots&\mathbf w_m\end{bmatrix} \begin{bmatrix} \frac1{\mathbf w_1^T\mathbf w_1} &&0 \\ &\ddots& \\ 0&& \frac1{\mathbf w_m^T\mathbf w_m} \end{bmatrix} \begin{bmatrix}\mathbf w_1^T\\\vdots\\\mathbf w_m^T\end{bmatrix} \mathbf v = W(W^TW)^{-1}W^T\mathbf v.$$ The new factor $(W^TW)^{-1}$ is a diagonal matrix that serves to normalize the basis vectors. (Its off-diagonal elements are all zero since the basis is orthogonal.) Observe that the shape of this formula is essentially the same as that for the projection onto a single vector. Also, we can interpret the parts of the formula the same way as we did for the orthonormal basis: $(W^TW)^{-1}W^T\mathbf v$ computes the $W$-coordinates of the projection of $\mathbf v$ and multiplying by $W$ converts them to the standard basis.

Continuing on to an arbitrary basis for the subspace, we can no longer simply take dot products, suitably normalized, with the basis vectors to find the $W$-coordinates of the projection of $\mathbf v$. The coordinate directions are no longer independent in the sense that moving in the direction of one of the basis vectors can change the components in other basis vector directions, too, so we need a way to undo all of the “cross-talk” among them. The Gram matrix $W^TW$ consists of all of the pairwise dot products of these basis vectors, and it turns out that multiplying by its inverse untangles the interactions among them: the formula for the orthogonal projection matrix in the previous paragraph also works for any basis of the subspace. (Almacomet links to a nice blog post about this in his comment.) In fact, the projection matrix that results from this formula is independent of the choice of basis for the subspace.

Finally, what if we’re using something other than the standard Euclidean inner product, or equivalently, working with a non-orthonormal basis for the ambient space? In that case, there’s some fixed positive-definite matrix $A$ such that $\langle\mathbf v,\mathbf w\rangle = \mathbf w^TA\mathbf v$. If you incorporate this into the above derivations, you’ll find that the orthogonal projection of $\mathbf v$ relative to this inner product is $$W(W^TAW)^{-1}W^TA \mathbf v:$$ as one might have expected, the matrix $A$ appears in the formula in places that correspond to dot products.

**Bumbble Comm** · Answer 2 · 2018-11-24 10:07:01

If $\mathbf W$ is a finite-dimensional subspace of an inner product space $\mathbf V$, and if $\mathbf b$ is a vector in $\mathbf V$, then $Proj_W(\mathbf b)$ is the best approximation to $\mathbf b$ from $\mathbf W$ in the sense that $$\left\lVert b-Proj_b(W)\right\rVert \lt \left\lVert b-w\right\rVert $$ for every vector $\mathbf w$ in $\mathbf W$ that is different from $Proj_W(\mathbf b)$

Hence, one way to find the least squares solution of inconsistent linear system $A\mathbf x =\mathbf b$ is to calculate the orthogonal projection $Proj_W(b)$ on the column space W of A and solving the equation $$ A\mathbf x = Proj_W(\mathbf b)$$ $$\mathbf b-A\mathbf x=\mathbf b-Proj_W(\mathbf b)$$ and then multiplying both sides of above equation with $\mathbf A^\mathbf T$ $$A^T(\mathbf b-A\mathbf x)=A^T(\mathbf b-Proj_W(\mathbf b))$$ Since $\mathbf b-Proj_W(\mathbf b)$ is the complement of b that is orthogonal to the column space of A implies $\mathbf b-Proj_W(\mathbf b)$ is in null space of $A^T$(because null space of $A^T$ and column space of A are orthogonal complements to each other) hence $$A^T(\mathbf b-Proj_W(\mathbf b))=\mathbf 0$$ Therefore, $$A^T(\mathbf b-A\mathbf x)= \mathbf 0$$ which we can rewrite as $$ A^TA\mathbf x=A^T\mathbf b$$ $$\mathbf x=(A^TA)^{-1}A^T\mathbf b$$ Therefore, $$Proj_W(\mathbf b)=A\mathbf x=A(A^TA)^{-1}A^T\mathbf b$$

Understanding the formula $P_A = A(A^TA)^{-1}A^T$ for projection

There are 2 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in ANALYTIC-GEOMETRY

Trending Questions

Popular # Hahtags

Popular Questions