Ridge Regression derivation - vector to matrix

Question

Ridge Regression derivation - vector to matrix

98 Views Asked by Bumbble Comm At 04 Apr 2026 - 9:14

$ \min_w \frac{1}{2N} (y_n - x_n^Tw)^2 + \lambda ||w||^2 $

$ \frac{d}{dw} = \frac{1}{N} \sum_{n=1}^N (y_n - x_n^Tw)x_n + 2\lambda w $

$ w = (X^TX + \lambda 2N I)^{-1} X^Ty $

How do I go from line 2 to 3 ? How do I change from a vector to a matrix ? I can only derive until line 2.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2018-01-01 13:51:45

It might help to view the original objective function in terms of matrices and vectors.

Your original objective function is missing a summation, it should read:

$$ \frac{1}{2N} \sum_{n=1}^{N} (y_n - \mathbf{x_n}^T \mathbf{w})^2 + \lambda ||\mathbf{w}||^2 $$ I've also made the vectors bold in the above expression.

This expression can be written using vectors and matrices. Let $\mathbf{y}$ be the vector with components $y_{n}$, $1 \le n \le N$.

Let $X$ be the data matrix whose $n^{th}$ column is $\mathbf{x_{n}}$. The vector of coefficients is $\mathbf{w}$

The summation in the objective function is the norm of a vector so we can write the function as follows:

$$ \frac{1}{2N} \left\lVert \mathbf{y} - X^{T} \mathbf{w} \right\rVert^{2} + \lambda ||\mathbf{w}||^2 $$

Which can be written as $$ \frac{1}{2N} \left( \mathbf{y} - X^{T} \mathbf{w} \right)^{T} \left( \mathbf{y} - X^{T} \mathbf{w} \right) + \lambda \mathbf{w}^{T} \mathbf{w} $$

Expanding: $$ \frac{1}{2N} \left[\; \mathbf{y}^{T} \mathbf{y} - \mathbf{y}^{T} X^{T} \mathbf{w} - \mathbf{w}^{T} X \mathbf{y} + \mathbf{w}^{T} X X^{T} \mathbf{w} \;\right] + \lambda \mathbf{w}^{T} \mathbf{w} $$

We have $\mathbf{y}^{T} X^{T} \mathbf{w} = \mathbf{w}^{T} X \mathbf{y}$ so we can write: $$ \frac{1}{2N} \left[\; \mathbf{y}^{T} \mathbf{y} - 2 \mathbf{w}^{T} X \mathbf{y} + \mathbf{w}^{T} X X^{T} \mathbf{w} \;\right] + \lambda \mathbf{w}^{T} \mathbf{w} $$

Differentiating with respect to $\mathbf{w}$ gives $$ \begin{aligned} & \frac{1}{2N} \left[\; - 2 X \mathbf{y} + 2 X X^{T} \mathbf{w} \;\right] + 2 \lambda \mathbf{w} \\ =& - \frac{1}{N} X \mathbf{y} + \frac{1}{N} X X^{T} \mathbf{w} + 2 \lambda \mathbf{w} \end{aligned} $$

Setting this to zero $$ \begin{aligned} - \frac{1}{N} X \mathbf{y} + \frac{1}{N} X X^{T} \mathbf{w} + 2 \lambda \mathbf{w} &= 0 \\ X X^{T} \mathbf{w} + 2 N \lambda \mathbf{w} &= X \mathbf{y} \\ \left( X X^{T} + 2 N \lambda I \right) \mathbf{w} &= X \mathbf{y} \end{aligned} $$

To give finally $$ \mathbf{w} = \left( X X^{T} + 2 N \lambda I \right)^{-1} X \mathbf{y} $$

I'm not sure why I end up with a slightly different expression from your one. Please let me know if you can see why.

Ridge Regression derivation - vector to matrix

There are 1 best solutions below

Related Questions in LEAST-SQUARES

Trending Questions

Popular # Hahtags

Popular Questions