I am asking this question in context to Regularization/Ridge Regression
Let's say that there is a Matrix A of dimension n x d, where n is the number of rows and d is the number of columns ( n may or may not be larger than d)
Consequently, we cannot say if ATA is Invertible or not, irrespective of what is n or d
Let's say we have a diagonal matrix D (with diagonal elements > 0) and if we add it to ATA as follows -
D + ATA
Can we say that the resulting matrix will always be invertible, irrespective whether n is larger or smaller than d ? OR can we say that adding a diagonal matrix to any matrix converts it into a full rank matrix?
I have read in the literature that the addition of a diagonal matrix D to ATA 'regularizes' ATA and it becomes invertible, irrespective whether n is larger or smaller than d, but I am looking for a formal proof to it.
I assume we're working over $\Bbb R$, the real numbers system.
Since $\mathbf A$ is an $\mathbf n \times \mathbf d$ matrix, $\mathbf A^{\mathbf T} \mathbf A$ is a $\mathbf d \times \mathbf d$ matrix; thus the context indicates that $\text{size}(\mathbf D) = \mathbf d$. We may set
$\mathbf D = \text{diag}(\mu_1, \mu_2, \ldots, \mu_{\mathbf d}), \tag 0$
where $\mu_i > 0$, $1 \le i \le \mathbf d$.
Let
$0 \ne \mathbf x = (x_1, x_2, \ldots, x_{\mathbf d})^{\mathbf T} \in \Bbb R^{\mathbf d}; \tag 1$
then if $\langle \cdot, \cdot \rangle$ is the usual inner product on $\Bbb R^{\mathbf d}$, we have
$\langle \mathbf x, \mathbf A^{\mathbf T} \mathbf A \mathbf x \rangle = \langle \mathbf A \mathbf x, \mathbf A \mathbf x \rangle \ge 0; \tag 2$
furthermore, since the matrix $\mathbf D$ has only positive elements along is diagonal and zeroes elsewhere, we also have
$\langle \mathbf x, \mathbf D \mathbf x \rangle = \displaystyle \sum_1^{\mathbf d} \mu_i x_i^2 > 0; \tag 3$
then
$\langle x, (\mathbf D + \mathbf A^{\mathbf T} \mathbf A ) \mathbf x \rangle = \langle \mathbf x, \mathbf D \mathbf x \rangle + \langle \mathbf x, \mathbf A^{\mathbf T} \mathbf A \mathbf x \rangle > 0 \tag 4$
as well. By (4), $\mathbf D + \mathbf A^{\mathbf T} \mathbf A$ is positive definite; we further see that (4) precludes the possibility that
$(\mathbf D + \mathbf A^{\mathbf T}\mathbf A) \mathbf x = 0 \tag 5$
for any $\mathbf x \in \Bbb R^{\mathbf d}$; thus $\mathbf D + \mathbf A^{\mathbf T}\mathbf A$ is nonsingular, hence invertible, no matter what the values of $\mathbf d, \mathbf n > 0$ may be; in other words, $\mathbf D + \mathbf A^{\mathbf T}\mathbf A$ is of full rank $\mathbf d$.