Derivation of $\frac{\partial}{\partial\beta} = (y - X\beta)^t (y - X\beta) = 2X^T(y - X\beta)$ using matrix cookbook.

48 Views Asked by At

I am currently trying to differentiate the function

$$SS(\beta) = (y - X\beta)^T(y - X\beta)$$

with respect to the vector $\beta$ using the notation of the matrix cookbook. Here, $y \in \mathbb{R}^n, X \in \mathbb{R}^{n \times p}$ and $\beta \in \mathbb{R}^p$. \

First, $SS(\beta)$ is a scalar and $\beta$ is a $p$-dimensional column vector. Therefore, the derivative should be a $p$-dimensional column vector as well (see page 8 in the cookbook). Using identity (37) in the cookbook (the product rule), I find

$$ \frac{\partial}{\partial\beta} SS(\beta) = \Big[ \frac{\partial}{\partial\beta} (y - X\beta)^T \Big] \cdot (y - X\beta) + (y - X\beta)^T \cdot \frac{\partial}{\partial\beta} (y - X\beta). $$ For the first derivative, we can use identity (44) (derivative of transpose is equal to transpose of derivative). Finally, we use

$$ \frac{\partial}{\partial\beta}(y - X\beta) = -\frac{\partial}{\partial\beta} X\beta = -X\beta. $$

Together, this yields $$ \frac{\partial}{\partial\beta} SS(\beta) = -X^T (y - X\beta) + (y - X\beta)^T X. $$

This is generally not equal to $-2X^T(y - X\beta)$, which is what I expected. Where did I go wrong here?