In multivariate linear model, I have come across the following matrix-valued function of $\beta \in \Bbb R^p$.
$$\beta \mapsto(y-X\beta)(y-X\beta)^{T}$$
where matrix $X \in \Bbb R^{n \times p}$ and vector $y \in \Bbb R^n$ are given. I have to differentiate it with respect to $\beta \in \Bbb R^p$. Can anyone please help me with how to differentiate this?
Some other examples that I have seen on this site are differentiation of $(y-X\beta)^{T}(y-X\beta)$ (which is a scalar), but here the expression is an $n×n$ matrix and I am not sure how to handle this. Also, I would appreciate some reference or reading materials on this kind of matrix-vector differentiation for beginners.
The notion of differentiation in this context can come back to the meaning of the derivative as some approximated behaviour near a given point : that is, if you denote by $f$ your function,
$$ f(\beta + \varepsilon \alpha) = f(\beta) + \varepsilon f'(\beta) \cdot \alpha + o(\varepsilon) $$
so $f'(\beta)$ will be a linear operator from vectors to $n \times n$ matrices. In this case, you can do the asymptotic by yourself :
$$ f(\beta + \varepsilon \alpha) = (y - X \beta - \varepsilon X \alpha) (y - X \beta - \varepsilon X \alpha)^T \\ = (y - X \beta)(y - X \beta)^T - \varepsilon ( X \alpha (y - X \beta)^T + (y - X \beta) (X \alpha)^T ) + O(\varepsilon^2) $$
and you find :
$$ f'(\beta) : \alpha \mapsto X\alpha (y - X \beta)^T + (y - X \beta) \alpha^T X^T $$