I am currently trying to differentiate the function
$$SS(\beta) = (y - X\beta)^T(y - X\beta)$$
with respect to the vector $\beta$ using the notation of the matrix cookbook. Here, $y \in \mathbb{R}^n, X \in \mathbb{R}^{n \times p}$ and $\beta \in \mathbb{R}^p$. \
First, $SS(\beta)$ is a scalar and $\beta$ is a $p$-dimensional column vector. Therefore, the derivative should be a $p$-dimensional column vector as well (see page 8 in the cookbook). Using identity (37) in the cookbook (the product rule), I find
$$ \frac{\partial}{\partial\beta} SS(\beta) = \Big[ \frac{\partial}{\partial\beta} (y - X\beta)^T \Big] \cdot (y - X\beta) + (y - X\beta)^T \cdot \frac{\partial}{\partial\beta} (y - X\beta). $$ For the first derivative, we can use identity (44) (derivative of transpose is equal to transpose of derivative). Finally, we use
$$ \frac{\partial}{\partial\beta}(y - X\beta) = -\frac{\partial}{\partial\beta} X\beta = -X\beta. $$
Together, this yields $$ \frac{\partial}{\partial\beta} SS(\beta) = -X^T (y - X\beta) + (y - X\beta)^T X. $$
This is generally not equal to $-2X^T(y - X\beta)$, which is what I expected. Where did I go wrong here?