In sparse ridge regression, why we have this property

11 Views Asked by At

In ridge regression, we can estimate $\hat y$=$X(X^TX+\lambda I)^{-1}y$,where $X$ is covariate matrix with n rows and p column. And my teacher says that we can use SVD to rewrite this formula as:$\hat y$=$(XX^T+\lambda I)^{-1}XX^Ty$.

I have no idea about the last formula. Could you give me more details?