How is $w=\lambda^{-1}X'(y-Xw)$ derived? [Ridge Regression]

309 Views Asked by At

In Ridge Regression we try to find the minimum of the following loss function:

$$\text{min}_w\mathcal{L}_{\lambda}(w,S)=\text{min}\lambda\|w\|^2+\sum^l_{i=1}(y_i-g(x_i))^2$$

Where:

  • $\lambda$ is a positive number that defines the relative trade-off betweeen norm and loss
  • $\mathcal{L}$ is the loss function
  • $w\in\mathbb{R}^n$ is the vector of weights
  • $g(x_i)$ is the predicted value of observation $x_i$

Taking the derivative of the cost function with respect to the parameters we obtain the equations (*)

$$X'Xw+\lambda w=(X'X+\lambda I_n)w=X'w$$

Where:

  • $I_n$ is the $n\times n$ identity matrix
  • $X\in \mathbb{R}^{l\times n}$ is the data matrix
  • $X'$ is the transpose of $X$

The solution to the above equation is

$$w=(X'X+\lambda I_n)^{-1}X'y$$

Now, my book says that we can rewrite equations (*) in terms of $w$:

$$w=\lambda^{-1}X'(y-Xw)=X'\alpha$$

showing that $w$ can be written as a linear combination of the training points $w=\sum^l_{i=1}\alpha_ix_i$ with $\alpha=\lambda^{-1}(y-Xw)$

I have a hard time understanding how is $w=\lambda^{-1}X'(y-Xw)$ derived. Can someone show this algebraically?

1

There are 1 best solutions below

0
On BEST ANSWER

Unfortunately equation (*) has a typo. You can tell there's a problem on the right hand side: the dimensions are wrong for $X^\prime\in\mathbb{R}^{n\times l}$ to multiply $w\in\mathbb{R}^n$.

We start from the objective function: $$\mathcal{L}(w) = ||y-Xw||^2 + \lambda||w||^2$$ where $y\in\mathbb{R}^l$, $w\in\mathbb{R}^n$ and $X\in\mathbb{R}^{l\times n}$. The derivative with respect to $w$ is given by $$\nabla_w\mathcal{L} = -2X^\prime(y-Xw) + 2\lambda w,$$ where $X^\prime$ is the transpose of $X$. Setting the gradient to zero immediately gives us the expression for $w$ which you were interested in: $$ w = \frac{1}{\lambda}X^\prime(y-Xw). $$

To find the correct version of (*), we just collect the terms with $w$: $$ (X^\prime X + \lambda I_n)w = X^\prime y,$$ which, when multiplied by the inverse of the left-hand matrix, leads us to the solution that you provided: $$ w = (X^\prime X + \lambda I_n)^{-1}X^\prime y.$$