I am taking a machine learning course, and I learned that in LS (least square), we have
$$\hat{\theta} = (\Phi\Phi^T)^{-1}\Phi y$$
But for RLS (regularized least square), we have
$$\hat{\theta} = (\Phi\Phi^T + \lambda \textbf{I})^{-1}\Phi y$$
$\Phi \in \mathbb R^{d \times n}$, where $d$ is the number of features and $n$ is the number of samples.
The professor mentioned that, when $n \lt d$ (sample size is smaller than feature size), the term $\Phi \Phi^T$ will be noninvertible, and will cause overfitting (which I understand). So the extra term in RLS ($\lambda \textbf{I}$) acts as a regularization term to make sure the matrix
$$\Phi \Phi^T + \lambda \textbf{I}$$ is invertible. This part is where I get stuck. I don't see why adding an extra term $\lambda \textbf{I}$ can make the whole matrix invertible. Is there a proof for this?
I think you can prove it using positive definiteness of a matrix. Note that for any $v\in \mathbb{R}^{d},\, v^{T}(\phi\phi^{T})v=(\phi^{T} v)^T(\phi^{T} v)\geq0$ and $\lambda I$ is already positive definite because $\lambda>0$. Thus,
$v^{T}(\phi\phi^{T}+\lambda I)v = v^{T}(\phi\phi^{T})v + v^{T}\lambda I v\geq v^{T}(\lambda I)v>0$.
Therefore, $(\phi\phi^{T}+\lambda I)$ is positive definite and hence invertible.