Why does adding $\lambda \boldsymbol{I}$ to $\boldsymbol{X}^T\boldsymbol{X}$ for $\lambda > 0$ guarantee invertibility?

458 Views Asked by At

This question is inspired by regularized least squares, where it is stated that $$ X^TX + \lambda I $$ is guaranteed to be invertible for all $\lambda > 0$. Is there an intuitive reason for how adding a positive element along the diagonal of $X^TX$ guarantees invertibility?

I've tried to think about this in terms of full rank, determinants (> 0), eigenvalues (non-zero), but nothing is jumping out at me.

5

There are 5 best solutions below

1
On BEST ANSWER

It's more than just for some $\lambda > 0$, it's actually for all $\lambda > 0$. The eigenvalues of $X^\top X$ are real and non-negative, and adding $\lambda I$ increases each eigenvalue by $\lambda$ (with the same eigenvector). Thus, every eigenvalue of $X^\top X + \lambda I$ is at least $\lambda$, which makes them all strictly positive.

If the nullspace of a matrix is non-trivial, then $0$ is an eigenvalue. Contrapositively, since $0$ is not an eigenvalue, $X^\top X + \lambda I$ is invertible.

0
On

Eigenvalues is a nice way to think about it. If $A$ has eigenvalue $\mu$, $A+\lambda I$ has eigenvalue $\mu+\lambda$ (think about why). So if $\lambda$ is big enough, we can get all the eigenvalues higher than zero.

0
On

Suppose the matrices have all real eigenvalues. Then, $X^tX$ is a real symmetric matrix, hence is (orthogonally) diagonalizable (over $\Bbb{R}$). Say, $X^tX = Q^{-1}DQ$, for some diagonal $D$ and invertible $Q$ (actually, even orthogonal).

So, $X^tX + \lambda I = Q^{-1}(D + \lambda I) Q$. Notice that $D + \lambda I$ is a diagonal matrix, so it is invertible if and only if all the diagonals (which are its eigenvalues) are non-zero. So, if $\lambda$ is sufficiently large and positive (resp. negative) then all the entries of $D + \lambda I$ will be positive (rep. negative). So, $D+ \lambda I$ is invertible. Therefore, the entire product $X^tX + \lambda I = Q^{-1}(D + \lambda I)Q$ will be invertible.

0
On

It's even easier than the answers suggest: Let $\lambda \in \mathbb{R}$ be positive. If $v \in \mathbb{R}^n$ (where $n$ is the size of the matrix $X$) is nonzero, then $v^T \left(X^T X + \lambda I_n\right) v = \underbrace{v^T X^T X v}_{=\left(Xv\right)^T Xv = \left|Xv\right|^2 \geq 0} + \underbrace{\lambda}_{> 0} \underbrace{v^T v}_{= \left|v\right|^2 > 0} > 0$, so that $v^T \left(X^T X + \lambda I_n\right) v \neq 0$ and therefore $\left(X^T X + \lambda I_n\right) v \neq 0$. This shows that the $n\times n$-matrix $X^T X + \lambda I_n$ has trivial kernel, and therefore is invertible (since it is a square matrix over a field).

0
On

I assume $X$ is a real matrix operating on $\Bbb R^n$ equipped with the standard inner product $\langle \cdot, \cdot \rangle$.

$X^TX + \lambda I \tag 1$

is invertible if and only if

$\ker (X^TX + \lambda I) = \{0\}, \tag 2$

that is, if and only if

$(X^TX + \lambda I) \vec x \ne 0 \tag 3$

for all non-zero vectors $\vec x$. We have

$\langle \vec x, (X^TX + \lambda I) \vec x \rangle = \langle \vec x, X^TX \vec x + \lambda I \vec x \rangle = \langle \vec x, X^TX \vec x \rangle + \langle \vec x, \lambda I \vec x \rangle$ $= \langle X \vec x, X \vec x \rangle + \langle \vec x, \lambda \vec x \rangle = \langle X \vec x, X \vec x \rangle + \lambda \langle \vec x, \vec x \rangle; \tag 4$

now if

$x \ne 0, \tag 5$

then

$\langle \vec x, \vec x \rangle > 0, \tag 6$

and so with

$\lambda > 0, \tag 7$

$\lambda \langle \vec x, \vec x \rangle > 0; \tag 8$

since

$\langle X \vec x, X \vec x \rangle \ge 0 \tag 9$

for every vector $\vec x$, it follows from (4) that

$\langle \vec x, (X^TX + \lambda I) \vec x \rangle = \langle X \vec x, X \vec x \rangle + \lambda \langle \vec x, \vec x \rangle > 0; \tag{10}$

from this we conclude that $\vec x \ne 0$ implies

$(X^TX + \lambda I) \vec x \ne 0; \tag{11}$

and hence that

$\ker (X^TX + \lambda I) = \{0\}, \tag{12}$

and thus $X^TX + \lambda I$ is non-singular, and therefore invertible.