Why is the second derivative denoted by $\frac{\partial}{\partial x \partial x^T}$ in matrix calculus?

77 Views Asked by At

When trying to derive RSS of linear model using denominator layout

$\frac{\partial}{\partial\beta}(y - X\beta)^T(y - X\beta) = \frac{\partial(y -X\beta)}{\partial\beta}\frac{(y - X\beta)^T(y - X\beta)}{\partial(y - X\beta)} = (-X)^T (2 (y - X\beta)) = -2X^T(y - X\beta)$

$\frac{\partial}{\partial\beta}(-2X^T(y - X\beta)) = \frac{\partial(y - X\beta)}{\partial\beta}\frac{\partial(-2X^T(y - X\beta))}{\partial(y - X\beta)} = (-X)^T(-2X^T)^T = 2 X^T X$

My question is why text book is denoting the second derivate as

$\frac{\partial(y - X\beta)^T(y - X\beta)}{\partial\beta\partial\beta^T}= 2 X^T X$

rather than

$\frac{\partial(y - X\beta)^T(y - X\beta)}{\partial\beta\partial\beta}= 2 X^T X$

is there anything wrong in my derivation?