I am reading a textbook which features results on the multivariate normal distribution and the author writes out results with no justification, which take me a page to verify, and they end up being right. These claims don't seem obvious to me. I'll give an example.
Here's the setup:
Let $y_1,y_2,...,y_n, \mu \in \mathbb{R}^d$, let $A \in \mathbb{R}^{d \times d}$
Let $Y$ be the $n \times d$ matrix, formed by taking row $i$ to be $y_i$.
Let $e \in \mathbb{R}^d$ be a vector of 1's.
An example of a claim:
$\sum_{i}(y_i-\mu)^{T}A(y_i-\mu)=\text{trace}(A(Y-e\mu^{T})^{T}(Y-e\mu^{T}))$
This claim took me a long time to verify, but the author included zero justification and just followed it on from his last line of work. There are more examples of claims he makes which would take me a while to verify, but I feel like I might be missing a trick given the frequency of this happening.
Can anyone explain how either I can get better at doing these, or at least some verification that I'm not crazy and it's not super trivial?
Many thanks.
Here is a short derivation of the identity in question.
$$ \sum_{i}(y_i-\mu)^{T}A(y_i-\mu) = \sum_{i}\operatorname{tr}[(y_i-\mu)^{T}A(y_i-\mu)]\\ = \sum_i [\operatorname{tr}A(y_i-\mu)(y_i-\mu)^{T}] = \operatorname{tr}\left[ A\sum_i(y_i-\mu)(y_i-\mu)^{T}\right]. $$ Now, note that $$ \sum_i(y_i-\mu)(y_i-\mu)^{T} = \pmatrix{y_1 - \mu & \cdots & y_n - \mu} \pmatrix{(y_1 - \mu)^T \\ \vdots \\ (y_n - \mu)^T} = (Y - e\mu^T)^T(Y - e\mu^T). $$