Matrix inversion to solve least squares problem

4.7k Views Asked by At

I want to solve a least squares problem in the form of $\mathbf{A}\vec{x}=\vec{b}$, where $\mathbf{A}$ is a $m\times n$ matrix asociated to the transformation $T:\mathbb{R}^n \to \mathbb{R}^m$; and where $\vec{b} \notin C(\mathbf{A})$, menaing that there is no immediate solution for $\vec{x}$.

Even though $\mathbf{A}\vec{x}=\vec{b}$ has no solution, I know I can estimate the $\hat{\vec{x}}$ that would minimize the error, and that after some algebra (sorry, I have no reference at hand), it can be expressed as:

$$\mathbf{A}^{\intercal}\mathbf{A}\hat{\vec{x}} = \mathbf{A}^{\intercal}\mathbf{b}$$ My problem is that all text books I have come across stop here. THey state that the new formulation of the problem have a solution (that, I understand), but they say nothing about solving for $\hat{\vec{x}}$. Typically I would go about and solve like this:

$$\hat{\vec{x}} = (\mathbf{A}^{\intercal}\mathbf{A})^{-1}\mathbf{A}^{\intercal}\mathbf{b}$$

However, this requires that $\mathbf{A}^{\intercal}\mathbf{A}$ be invertible.

Question: is $\mathbf{A}^{\intercal}\mathbf{A}$ always invertible? If so, how can I prove it?


Edit: for the particular case I'm interested in, $\mathbf{A}$ has more rows than columns, as it comes from an over-determined system.

Edit 2: I have also checked the formulation of my problem. All columns are linearly independent, meaning that for my particular case, $\mathbf{A}$ has full rank.

3

There are 3 best solutions below

9
On BEST ANSWER

By taking into account all the provided answers and (very useful) comments, I was able to refine my online search and stumbled upon this video by Khanacademy. With it, I was able to further progress until I was able to write down a proof that I myself was able to understand. Although I'm not sure if its OK to answer my own question, I wanted to post it nonetheless to (i) peer-review it and (ii) leave it here for the community.


Postulate

For any matrix $\mathbf{A}$ of size $m\times n$, the matrix multiplication of its transpose times itself, $\mathbf{A}^{\intercal}\mathbf{A}$, is invertible if $m\geq n$ and $\mathbf{A}$ has full rank. That is, if all $n$ columns in $\mathbf{A}$ are linearly independent.

Proof

Assuming that $\mathbf{A}_{m\times n}$ has full rank (i.e. all its columns are L.I.), we know that its nullspace will only contain the zero vector $\mathbf{0}_n$. $$ Rank(\mathbf{A})=n ~~\Leftrightarrow~~ N(\mathbf{A})=\{\mathbf{0}_n\} ~~\Leftrightarrow~~ \mathbf{A}\mathbf{x} = \mathbf{0}_m \text{ only for } \mathbf{x} = \mathbf{0}_n $$

Conversely, $\mathbf{A}^{\intercal}\mathbf{A}$ will also be full rank, as it will have dimensions $n \times n$ and the same rank as $\mathbf{A}$. In particular, it will have the same rank as $\mathbf{A}$ , because the matrix multiplicaiton $\mathbf{A}^{\intercal}\mathbf{A}$ will lead to a linear combination of either the rows of $\mathbf{A}$ or the columns of $\mathbf{A}^{\intercal}$ (which is equivalent), and because the column rank is equal to the row rank, meaning that the number of L.I. rows (or columns) in $\mathbf{A}^{\intercal}\mathbf{A}$ will be the same as in $\mathbf{A}$. Thus: $$ rank(\mathbf{A}^{\intercal}\mathbf{A}) = rank(\mathbf{A})=n $$

This can be further proven by takin into acount the following:

  • $\mathbf{v}^{\intercal}\mathbf{0}=0$, for any vector $\mathbf{v}$.

  • $\mathbf{u}^{\intercal} \mathbf{v}=\mathbf{u}\cdot\mathbf{b} $, for any two vectors because of the equivalence between dot product and matrix multiplication.

  • The matrix sizes are: $\mathbf{A}^{\intercal}_{n\times m} \mathbf{A}_{m\times n} \mathbf{v}_{n\times 1} = \mathbf{0}_{n\times 1}$.

$$\begin{aligned}\mathbf{A}^{\intercal}\mathbf{A}\mathbf{v} &= \mathbf{0} \\\mathbf{v}^{\intercal}\mathbf{A}^{\intercal}\mathbf{A}\mathbf{v} &= \mathbf{v}^{\intercal}\mathbf{0} = 0\\(\mathbf{A}\mathbf{v})^{\intercal}\mathbf{A}\mathbf{v} &= 0, \text{ where } \mathbf{A}\mathbf{v} \text{ is a vector wit size } n\times 1 \\(\mathbf{A}\mathbf{v})^{\intercal} \cdot (\mathbf{A}\mathbf{v}) &= 0\\\mid\mid \mathbf{A}\mathbf{v} \mid\mid^2 &= 0 \\& \Rightarrow \mathbf{A}\mathbf{v} = \mathbf{0} \\& \Rightarrow \text{if } \mathbf{v} \in N(\mathbf{A}^{\intercal}\mathbf{A}) \Rightarrow \mathbf{v} \in N(\mathbf{A}) \Rightarrow \mathbf{v} = \mathbf{0}, \text{because } N(\mathbf{A})=\mathbf{0} \\ & \Rightarrow N(\mathbf{A}^{\intercal}\mathbf{A}) = \{ \mathbf{0}\}\end{aligned}$$

Finally, we can conclude that $\mathbf{A}^{\intercal}\mathbf{A}$ is a square matrix with linearly independent columns, a sufficient condition to be invertible.

6
On

$\mathbf{A}^{\intercal}\mathbf{A}$ is certainly not always invertible. In fact, it will have the same rank as $\mathbf{A}$. Thus if $\mathbf{A}$ has more columns than rows, then $\mathbf{A}^{\intercal}\mathbf{A}$ will never be invertible. However, when doing least squares in practice, $\mathbf{A}$ will have many more rows than columns, so $\mathbf{A}^{\intercal}\mathbf{A}$ will have full rank and thus be invertible in nearly all cases.

0
On

However, this requires that $A^{T}A$ be invertible.

No, it doesn't. In practice, it is basically never invertible.

$$ \hat{x} = (A^{T}A)^{-1}A^{T}b \tag{1} $$

can be given by the pseudo-inverse. You can write matrix with the SVD

$$ A = U \Sigma V^{T} \tag{2}$$

where $U \in \mathbb{R}^{m \times m}$ $V \in \mathbb{R}^{n \times n}$ are orthogonal matrices and $\Sigma$ is the matrix of singular values $\Sigma \in \mathbb{R}^{m \times n}$ $$ \Sigma = \textrm{diag}(\sigma_{1}, \cdots, \sigma_{r}, 0, \cdots, 0) \tag{3} $$

the pseudo inverse is given by an $ n \times m $ matrix $$ A^{\dagger} = V \Sigma^{\dagger} U^{T} \tag{4}$$

where $\Sigma^{\dagger}$ is

$$ \Sigma^{\dagger} =\textrm{diag}(\frac{1}{\sigma_{1}}, \cdots, \frac{1}{\sigma_{r}}, 0, \cdots, 0) \tag{5} $$

$$ x^{\dagger} = A^{\dagger}b = V \Sigma^{\dagger}U^{T} b \tag{6} $$

then we get

$$ \| Ax -b \| = \| U\Sigma V^{T}x - b \| = \| \Sigma V^{T}x - U^{T}b \| \tag{7}$$

if we let $ y = V^{T}x $ then we have $ \| x\| = \| y\| $ since $ U$ is an isometry we get $\| Ax-b\|$ is minimized iff $\| \Sigma y - U^{T} b \| $ is minimized. Then we show the least squares solution is. $$ y^{\dagger} = \Sigma^{\dagger}U^{T}b \tag{8}$$

since $ y = V^{T}x$ with $\|x\|$ = $\|y\|$ we get

$$ x^{\dagger} = V \Sigma^{\dagger} U^{T} b = A^{\dagger}b \tag{9} $$