Linear Algebra Wizardry

96 Views Asked by At

I am reading a textbook which features results on the multivariate normal distribution and the author writes out results with no justification, which take me a page to verify, and they end up being right. These claims don't seem obvious to me. I'll give an example.

Here's the setup:

Let $y_1,y_2,...,y_n, \mu \in \mathbb{R}^d$, let $A \in \mathbb{R}^{d \times d}$

Let $Y$ be the $n \times d$ matrix, formed by taking row $i$ to be $y_i$.

Let $e \in \mathbb{R}^d$ be a vector of 1's.

An example of a claim:

$\sum_{i}(y_i-\mu)^{T}A(y_i-\mu)=\text{trace}(A(Y-e\mu^{T})^{T}(Y-e\mu^{T}))$

This claim took me a long time to verify, but the author included zero justification and just followed it on from his last line of work. There are more examples of claims he makes which would take me a while to verify, but I feel like I might be missing a trick given the frequency of this happening.

Can anyone explain how either I can get better at doing these, or at least some verification that I'm not crazy and it's not super trivial?

Many thanks.

3

There are 3 best solutions below

0
On BEST ANSWER

Here is a short derivation of the identity in question.

$$ \sum_{i}(y_i-\mu)^{T}A(y_i-\mu) = \sum_{i}\operatorname{tr}[(y_i-\mu)^{T}A(y_i-\mu)]\\ = \sum_i [\operatorname{tr}A(y_i-\mu)(y_i-\mu)^{T}] = \operatorname{tr}\left[ A\sum_i(y_i-\mu)(y_i-\mu)^{T}\right]. $$ Now, note that $$ \sum_i(y_i-\mu)(y_i-\mu)^{T} = \pmatrix{y_1 - \mu & \cdots & y_n - \mu} \pmatrix{(y_1 - \mu)^T \\ \vdots \\ (y_n - \mu)^T} = (Y - e\mu^T)^T(Y - e\mu^T). $$

0
On

Can anyone explain how either I can get better at doing these

The same way you get better at doing anything else - by practice.


or at least some verification that I'm not crazy and it's not super trivial?

It's not trivial at the slightest, but that's not the criterion. It's not about "is it trivial", it's about "is it interesting enough".

Let me explain.

The book you read was not about linear algebra, it was about something else, in your case, it sounds like probability. When writing a book about one subject, it is usually best if the author includes as little of other subjects as possible, while still keeping the book understandable. The derivation of the claim you cite is a purely algebraic manipulation that the author did not consider interesting enough to include.

0
On

Being familiar with some properties of trace might help.

We know that $Tr(AB)=Tr(BA)$ and $Tr(A+B)=Tr(A)+Tr(B)$

\begin{align} \sum_i (y_i - \mu)^TA(y_i - \mu) &= \sum_i Tr((y_i - \mu)^TA(y_i - \mu)) \\ &=\sum_iTr(A(y_i - \mu)(y_i - \mu)^T) \\ &=Tr(\sum_iA(y_i - \mu)(y_i - \mu)^T) \\ &=Tr(A\sum_i(y_i - \mu)(y_i - \mu)^T)\\ &=Tr(A\left[y_1-\mu, \ldots y_n-\mu \right]\left[y_1^T-\mu^T, \ldots y_n^T-\mu^T \right]^T)\\ &=Tr(A(Y-e\mu^T)^T(Y-e\mu)) \end{align}