Can $\| \left( X'X + \lambda I \right) ^{-1} X'y \| = t$ be solved for $\lambda$?

447 Views Asked by At

In this post I suggested that the expression

$$ \| \left( X'X + \lambda I \right) ^{-1} X'y \| = t $$

couldn't be easily solved for $\lambda$, because you need to "invert" the norm. But in general my knowledge of advanced arithmetic tricks is poor.

Can this expression be solved for $\lambda$? If so, how would you do it?

2

There are 2 best solutions below

0
On BEST ANSWER

You can start by writing a singular value decomposition (SVD) for matrix $X$. For any matrix $X$ (not necessarily square), the SVD is $$ X = U S V', $$ where $U$, $V$ are orthogonal matrices, and $S$ is a diagonal matrix with entries $\sigma_i \geq 0$. (Quick test: in Matlab, SVD is computed in a couple of seconds for $X$ a random matrix of size $2000 \times 3000$. Don't know how huge your actual problem is.)

Then, plugging the SVD in your formula, you get $$ \| \left( X'X + \lambda I \right) ^{-1} X'y \|^2 = ... = y' U D_\lambda D_\lambda U' y = z_\lambda' z_\lambda = \| z_\lambda \|^2, $$ where $D_\lambda$ is a diagonal matrix with entries $$ d_i = \frac{\sigma_i}{(\sigma_i^2 + \lambda)}, $$ and $z_\lambda = D_\lambda U' y$. It can be seen that $ \| z_\lambda \|^2 $ gets its largest value when $\lambda = 0$: $$ \| z_0 \|^2 = \sum_{i=1}^r \frac{(u_i' y)^2}{\sigma_i^2},$$ where $u_i$ is the $i$th column of $U$, and $i = 1, ..., r$ denotes the sum over all $\sigma_i > 0$ (i.e., positive singular values). This value can be computed explicitly, if you can compute the SVD. Then it can be seen that if $t^2 > \| z_0 \|^2$, your equation surely has no solution for any $\lambda > 0$.

How to solve $\lambda$ in the original equation? Again, if you can compute the SVD, you have the equation for $\lambda$ $$ f(\lambda) = \| z_\lambda \|^2 = \sum_{i=1}^r \frac{\sigma_i^2 (u_i' y)^2}{(\sigma_i^2 + \lambda)^2} = t^2,$$ where $\sigma_i$ and $u_i' y$ can be "precomputed", as they do not depend on $\lambda$. (Expanding by $\Pi_i (\sigma_i^2 + \lambda)^2$ gives a large polynomial, so you are not going to get an exact solution). Using the above equation, $f'(\lambda)$ is computable and you can use e.g. Newton's method to solve $f(\lambda) = t^2$. Assuming that the above calculation is correct, you should get the same method as in the other answer...

1
On

In principle, the left side of the equation $y' X (X'X + \lambda I)^{-2} X' y = t^2$ is a rational function of $\lambda$. However, if your matrices are large you won't want to evaluate this explicitly. I would suggest using Newton's method. Note that $$ \dfrac{d}{d\lambda} (X'X + \lambda I)^{-2} = - 2 (X'X + \lambda I)^{-3}$$ so the iteration is $$\lambda_{n+1} = \lambda_n - \frac{\|(X'X + \lambda_n I)^{-1} X' y\|^2 - t^2}{-2 y' X (X' X + \lambda_n I)^{-3} X' y} $$ Of course, you needn't compute any inverse matrices, rather $(X' X + \lambda_n I)^{-1} X' y$ is a solution of $(X' X + \lambda_n I) x = X' y$ etc.