Ridge regression estimator in high-dimensions: is $(X^TX + \epsilon I_p)^{-1}X^Ty$ finite as $\epsilon \rightarrow 0$?

119 Views Asked by At

Consider the ridge regression estimator $$\hat{\beta}_{\epsilon} := (X^TX + \epsilon I_p)^{-1}X^Ty$$ where $X$ is an $n$ by $p$ matrix with $n < p$. Let $\| \hat{\beta}_{\epsilon} \|_{1} := \sum_{j=1}^p |\hat{\beta}_{j,\epsilon}|$. Is $ \limsup_{\epsilon \rightarrow 0}\| \hat{\beta}_{\epsilon} \|_{1}$ finite?

Edit: assume further than every column of $X$ is a ($p$ by $1$) non-zero vector. I think this is sufficient to ensure $ \limsup_{\epsilon \rightarrow 0}\| \hat{\beta}_{\epsilon} \|_{1}$ finite, but I do not yet have a formal argument.

2

There are 2 best solutions below

0
On BEST ANSWER

We have $\lim_{\epsilon\to0}(X^TX + \epsilon I_p)^{-1}X^T=X^+$, the Moore-Penrose pseudoinverse of $X$. Therefore $$ \lim\sup_{\epsilon\to0}\|(X^TX + \epsilon I_p)^{-1}X^Ty\|_1=\|X^+y\|_1<\infty. $$

0
On

Yes, the idea is to express the solution $\hat{\beta}$ in terms of the SVD of $X$.

Let $X=U\Sigma V^t$ where $U$ and $V$ are orthogonal and $\Sigma$ is diagonal. Then

$(\epsilon+X^tX)^{-1}=(\epsilon+V\Sigma^2V^t)^{-1}=(V\epsilon V^t+V\Sigma^2V^t)^{-1}=(V(\epsilon+\Sigma^2)V^t)^{-1}=V(\epsilon+\Sigma^2)^{-1}V^t$

Therefore the regularization solution is given by

$\hat{\beta}_{\epsilon}=V(\epsilon+\Sigma^2)^{-1}V^tX^ty=V(\epsilon+\Sigma^2)^{-1}V^tV\Sigma U^ty=V(\epsilon+\Sigma^2)^{-1}\Sigma U^ty:=M_{\epsilon}y$

where $M_{\epsilon}=V(\epsilon+\Sigma^2)^{-1}\Sigma U^t$. Thus to show that the norm of $\hat{\beta}_{\epsilon}$ is bounded, we just have to show that the singular values of $M_{\epsilon}$ do not blow up. But we have just computed the SVD of $M_{\epsilon}$, from which we can see that the singular values are given by $\sigma_i/(\sigma_i^2+\epsilon)$, where $\sigma_i$ are the diagonal values of $\Sigma$ (i.e. the singular values of $X$). Either $\sigma_i=0$ or else $\sigma_i/(\sigma_i^2+\epsilon)\leq 1/\sigma_i$, so therefore $\|M_{\epsilon}\|_2=\sigma_{max}(M_{\epsilon})\leq 1/\sigma_{min>0}(X)$ i.e. the reciprocal of the smallest non-zero singular value of $X$, so in particular is bounded independent of $\epsilon$.

Therefore, boundedness of $\hat{\beta_{\epsilon}}$ in the L2 norm follows immediately, and boundeneds in the L1 norm follows from the fact that L1 and L2 are equivalent norms.