Why is $(\hat{X}^\top \hat{X} + \lambda n I)^{-1}\hat{X}^T \hat{y} = \hat{X}^T (\hat{X} \hat{X}^T + \lambda n I)^{-1} \hat{y}$ true?

112 Views Asked by At

I was trying to show the following:

$$(\hat{X}^T \hat{X} + \lambda n I)^{-1}\hat{X}^T \hat{y} = \hat{X}^T (\hat{X} \hat{X}^T + \lambda n I)^{-1} \hat{y}$$

I was told to use the Singular Value Decomposition of $\hat{X} = U \Sigma V^T = \sum^{r}_{i=1} \sigma_i u_i v_i^T$. So I tried:

$$ (\hat{X}^T \hat{X} + \lambda n I)^{-1} \hat{X}^T \hat{y} = (\hat{X}^T \hat{X} + \lambda n I)^{-1} (U \Sigma V^T)^\top \hat{y} $$

$$ ((U \Sigma V^T)^T (U \Sigma V^T) + \lambda n I)^{-1} V \Sigma U^T \hat{y} = ((V \Sigma^2 V^T) + \lambda n I)^{-1} V \Sigma U^T \hat{y} $$

however after that step I got stuck and it wasn't entirely obvious for me how to proceed. There are a lot of things that are confusing me about how proceed:

  1. First is that its not entirely clear to me that an inverse for $ (\hat{X}^T \hat{X} + \lambda n I)^{-1} = ((U \Sigma^2 V^T) + \lambda n I)^{-1}$ even exists.
  2. Second, even if it was invertible (i.e. an inverse existed), I'm not aware of any rules for sum of matrices and inverses (I think they do for transposes $(A + B)^T = A^T + B^T$ but not sure for inverses and can't find anything useful).

Anyone has any idea how to proceed? Or how I could further uses the SVD to show the equality I'm trying to show?

2

There are 2 best solutions below

6
On BEST ANSWER

Let $X$ be an $m\times k$ matrix. You have $$(X^T X + \lambda n I_m) X^T = X^T (X X^T + \lambda n I_k).$$ When $-\lambda n$ is not in the spectrum of $X^T X$ nor $X X^T$ (e.g. when $\lambda n >0$) then you may invert to get: $$ X^T(X X^T + \lambda n I_k)^{-1} = (X^T X + \lambda n I_m)^{-1}X^T .$$

5
On

If $\hat{X} = U\Sigma V^T$, then $$ \hat{X}^T\hat{X} + \lambda n I = V\Sigma^T\Sigma V^T + \lambda n I = V(\Sigma^T\Sigma + \lambda n I)V^T $$

Therefore, $(\hat{X}^T\hat{X} + \lambda n I)^{-1} = V(\Sigma^T\Sigma + \lambda n I)^{-1}V^T$. The matrix in brackets is invertible so long as $\lambda n \neq -\sigma^2$ for any singular value $\sigma$ in the spectrum of $\hat{X}$. Similarly,

$$ (\hat{X}\hat{X}^T + \lambda n I)^{-1} = U(\Sigma\Sigma^T + \lambda n I)^{-1}U^T $$

Thus we have

\begin{align} \hat{X}^T(\hat{X}\hat{X}^T + \lambda n I)^{-1} &= V\Sigma^T(\Sigma\Sigma^T + \lambda n I)^{-1}U^T\\ &= V(\Sigma^T\Sigma + \lambda n I)^{-1}\Sigma^TU^T\\ &= (\hat{X}^T\hat{X} + \lambda n I)^{-1}\hat{X}^T \end{align}

So essentially, using the SVD reduces the problem to the special case of diagonal matrices.