Replacing $I$ with S.P.D. $B$ in ridge regression, is the norm still bounded by OLS?

125 Views Asked by At

Assuming invertible $(\mathbf{A}^{\top}\mathbf{A})$, $\mathbf{A}\in\mathbb{R}^{n\times m}$ and $\gamma\geq1$, the norm of the ridge regression solution $\mathbf{x}_{\gamma} = (\mathbf{A}^{\top}\mathbf{A} + \gamma\mathbf{I})^{-1}\mathbf{A}^{\top}\mathbf{y}$ can be shown to be upper bounded by the norm of the OLS solution $\mathbf{x}_0 = (\mathbf{A}^{\top}\mathbf{A})^{-1}\mathbf{A}^{\top}\mathbf{y}$, as shown here $$ \|\mathbf{x}_{\gamma}\| \leq \|\mathbf{x}_0 \|\;. $$

I was wondering if a similar bound can be shown for the norm of $$ \mathbf{x'}_{\gamma} = (\mathbf{A}^{\top}\mathbf{A} + \gamma\mathbf{B})^{-1}\mathbf{A}^{\top}\mathbf{y}\;, $$ for some symmetric, positive semi-definite (or positive definite) $\mathbf{B}$. Since $\mathbf{x'}_{\gamma}$ is a more general form of $\mathbf{x}_{\gamma}$, it seems intuitive to me that this should be possible. But for a similar problem that I posted a few days ago no such upper bound could be shown.

I'll post a supporting simulation experiment later today!


Edit: Alternatively, showing the following would also be sufficient for my purposes $$ \|\mathbf{x}''_{\gamma}\| \leq \|\mathbf{x}''_{0} \|\;,\;\;\;\;\mathbf{x}''_{\gamma} = (\mathbf{A}^{\top}\mathbf{A} + \gamma\mathbf{K}^{\top}\mathbf{K})^{-1}\mathbf{A}^{\top}\mathbf{y}\;, $$ where the rows of $\mathbf{K}\in\mathbb{R}^{k\times m}$ are orthonormal, and $k\leq m$.

2

There are 2 best solutions below

2
On

I will consider only the case where $B$ is invertible. We have $$ (A^TA + \gamma B)^{-1} A^Ty = \\ B^{-1/2}([AB^{-1/2}]^T[AB^{-1/2}] + \gamma I)^{-1} [AB^{-1/2}]^T y = \\ B^{-1/2}(M^TM + \gamma I)^{-1}M^Ty, \quad M = AB^{-1/2}. $$ Applying your initial result tells us that $$ \|(M^TM + \gamma I)^{-1}M^Ty\| \leq \|(M^TM)^{-1}M^Ty\| \implies\\ \|B^{1/2}x_\gamma'\| \leq \|(M^TM)M^Ty\|. $$ Note that $z = (M^TM)^{-1}M^Ty$ is the minimizing vector $z$ of the minimization $\min_z \|AB^{-1/2}z - y\|$, which means that $x_0 = B^{-1/2}z$ is the minimizing solution to $\min_{x} \|Ax - y\|$. So, $z = B^{1/2}x_0$ and we have $$ \|B^{1/2} x_\gamma'\| \leq \|B^{1/2}x_0\|. $$

Note that the norm $\|\cdot\|_B$ defined by $\|x\|_B = \sqrt{x^TBx}$ defines a norm for which $\|x\|_B = \|B^{1/2}x\|$. With that, the above becomes $$ \|x_\gamma'\|_B \leq \|x_0\|_B. $$ Now, by the Rayleigh-Ritz theorem, we have $$ c_1^{-2}:= \lambda_{\min}(B) \leq \frac{\|x\|_B^2}{\|x\|^2} = \frac{x^TBx}{x^Tx} \leq \lambda_{\max}(B) =: c_2 \implies\\ c_1^{-2}\|x\|^2 \leq \|x\|_B^2 \leq c_2^2 \|x\|^2. $$ Note that setting $c_1$ such that $c_1^{-2} = \lambda_{\min}(B)$ only makes sense if the smallest eigenvalue of $B$ is non-zero, which is to say that $B$ is an invertible positive definite matrix.

From there, we can conclude that $$ \|x_\gamma\| \leq c_1 \|x_\gamma\|_B \leq c_1 \|x_0\|_B \leq c_1 c_2 \|x\| = \sqrt{\kappa(B)}\|x\|, $$ where $\kappa(B)$ denotes the condition number.

0
On

I'm going to answer the alternative problem.

Since rows of $\mathbf{K}\in\mathbb{R}^{k\times m}$ are orthonormal, therefore $\mathbf{K}^{\top}\mathbf{K}=\text{diag}(\mathbf{v})$ where $\mathbf{v}=[v_1, \dots, v_m]^{\top}$ and $v_i\in\{0, 1\}$. $\mathbf{v}$ has exactly $k$ ones and $m-k$ zeros.

We can now use the spectral theorem, as shown here, to show that $\|\mathbf{x}''_{\gamma}\|_2\leq \|\mathbf{x}''_0\|_2$.


Edit: Nevermind, this solution is not correct since $v_i\notin \{0, 1\}$ as shown by the counter example here (rather, the eigenvalues of $\mathbf{K}^{\top}\mathbf{K}$ should be in $\{0, 1\}$). Additionally, $\mathbf{K}^{\top}\mathbf{K}$ is not necessarily a diagonal matrix.