Differentiating a matrix function

588 Views Asked by Bumbble Comm At 12 Apr 2026 - 10:53

In the book "Elements of Statistical Learning", early on the author is discussing linear regression, and naturally discusses the residual sum of squares (RSS) based on the parameter space $\boldsymbol{\beta}$. In the general formulation, $\text{RSS}(\beta) = (\boldsymbol{y} - \boldsymbol{X}\beta)^T(\boldsymbol{y} - \boldsymbol{X}\beta)$ where $\boldsymbol{X}$ is an $N \times p$ matrix and $\beta$ is a $p \times K$ matrix. The author then says to minimize RSS, you differentiate with respect to $\beta$ and get $\boldsymbol{X}^T(\boldsymbol{y} - \boldsymbol{X}\beta) = 0$

My question is, what are the mechanics of differentiating with respect to the matrix $\beta$? I have a B.S. in physics, so I have a reasonably sophisticated math background, but I never covered this in my undergraduate education. I tried looking a bit into "Matrix calculus", but it wasn't much help. Is that the correct term? If this is the language used in the remainder of the textbook, what are some good resources somewhat familiar with vector calc and linear algebra to learn "matrix calc"?

Original Q&A

There are 2 best solutions below

user460426 On 24 Jul 2017 - 5:55 BEST ANSWER

It might help to expand the $RSS(\beta)$ out:

$$ RSS(\beta) = \textbf{y}^T\textbf{y} - \textbf{y}^T\textbf{X$\beta$} - \beta^T\textbf{X}^T\textbf{y} + \beta^T\textbf{X}^T\textbf{X$\beta$} $$

Differentiating with respect to a vector $\beta$ has many well known identities. If you're looking for a resource, Wikipedia has an extensive list of them. The ones we're interested in, are,

$$ \frac{d\textbf{a}^T\textbf{x}}{d\textbf{x}} =\frac{d\textbf{x}^T\textbf{a}}{d\textbf{x}} = \textbf{a} \;\;\;\; \& \;\;\;\; \frac{d\textbf{x}^T\textbf{A}\textbf{x}}{d\textbf{x}} = 2\textbf{A}\textbf{x} $$

where $\textbf{A}$ is a symmetric matrix (not a function of $\textbf{x}$) and $\textbf{a}$ is a vector (not a function of $\textbf{x}$). Use the first identity on the two middle terms in $RSS(\beta)$ and the second identity on the last term:

$$ \frac{d}{d\beta}RSS(\beta) = -2\textbf{X}^T\textbf{y} + 2\textbf{X}^T\textbf{X}\beta = 0 $$

Rearrange to get the normal equations. Since this is in the context of statistics, you might want to look at this question posted on Cross Validated. It gives many good references on matrix algebra in statistics.

Bumbble Comm On 24 Jul 2017 - 5:04

I find this document helpful for these kinds of questions. http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/3274/pdf/imm3274.pdf

Differentiating a matrix function

There are 2 best solutions below

Related Questions in CALCULUS

Related Questions in LINEAR-ALGEBRA

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in MATRIX-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions