Solution for $\beta$ in ridge regression

212 Views Asked by At

The RSS of the ridge regression in matrix form is:

$$RSS(\lambda) = (y−X\beta)^T(y−X\beta) +λ\beta^T\beta$$

the ridge regression solutions are easily seen to be

$$β_{ridge}= (X^TX+λI)^{−1}X^Ty$$

See page 64, https://web.stanford.edu/~hastie/Papers/ESLII.pdf

How is this derived because I dont think the solutions can be easily seen?

2

There are 2 best solutions below

0
On BEST ANSWER

You can see here how the derivative of $RSS=(y-X\beta)^T(y-X\beta)=(y^T-\beta^TX^T)(y-X\beta)$ has been obtained. It is

$$\frac{\partial RSS}{\partial \beta}=-2X^Ty+2X^TX\beta+2\lambda \beta$$

And the derivative of $\lambda \beta^T\beta$ w.r.t $\beta$ is $2\lambda \beta$, Setting the derivative equal to $0$.

$$-2X^Ty+2X^TX\beta+2\lambda \beta=0$$

$$2X^TX\beta+2\lambda \beta=2X^Ty$$

$$(X^TX+\lambda I) \beta=X^Ty$$

$$ \beta=(X^TX+\lambda I)^{-1}X^Ty$$

0
On

Differentiate RSS with respect to $\beta$ and set it to zero

We get $$2(-X^T)(y-X^T\beta) + 2\lambda \beta=0$$

$$X^TX\beta -X^Ty+\lambda \beta = 0$$

$$(X^TX+\lambda I)\beta = X^Ty$$

Hence

$$\beta = (X^TX+\lambda I)^{-1}X^Ty$$