For $\hat{\beta_{\mathrm{Ridge}}}$, where is the correct placement of $(X^TX+\lambda I)^{-1}$?

84 Views Asked by At

I am having some trouble finishing the derivation of the $\hat{\beta}$ that minimizes $(Y-X\beta)^T(Y-X\beta) + \lambda \beta^T\beta$. After finding the partial derivative w.r.t $\hat{\beta}$, I get

\begin{align} \hat{\beta}(X^TX+\lambda I) = Y^TX \end{align}

With $X, Y, \hat{\beta}$ being matrices, I multiply both sides by the inverse of $(X^TX+\lambda I)$. However, on which side does this inverse go? I get $\hat{\beta} = Y^TX(X^TX+\lambda I)^{-1}$ but I see others with $\hat{\beta} = (X^TX+\lambda I)^{-1}X^TY$.

Two questions: where on earth does the $(X^TX+\lambda I)^{-1}X^TY$ go?

Also, if $X,Y$ are two column vectors, is $X^TY = Y^TX$?

1

There are 1 best solutions below

3
On BEST ANSWER

Typically one writes the gradient as a column vector. In that convention, the gradient is $$2(X^\top X + \lambda I) \hat{\beta} - 2X^\top Y.$$ If you instead write the gradient as a row vector, it would be $$2\hat{\beta}^\top(X^\top X + \lambda I) - 2Y^\top X.$$ (Note carefully the shapes of $X^\top Y$ and $Y^\top X$.) Manipulating the latter will give you an expression for $\hat{\beta}^\top$ so it is fine that the inverse matrix appears on the right. Taking a final transpose will bring the inverse matrix to the left.