Why is matrix calculus still taught / useful?

138 Views Asked by At

Wikipedia has a huge page on matrix calculus, and there exist resources such as the The matrix cookbook, which contain huge sets of rules on how to take derivatives $\frac{\partial y}{\partial x}$ of various combinations of $y, x$ as scalars, vectors, and matrices.

However, can't all the rules be mechanically derived by following einstein tensor notation? Einstein tensor notation has the advantages of:

  1. Not needing to remember lists of rules, many complex and error-prone
  2. Generalizing to handle arbitrary cases of mixtures of matrices, vectors, and scalars
  3. Often smaller proofs?

So I really don't understand why matrix calculus is even taught in the first place. Can someone please help me see what I am missing?

EDIT: as an example of where I feel einstein tensor notation shines, say we wish to calculuate $\frac{\partial X}{\partial Y}$, where $X: \mathbb R^{n \times n} \rightarrow \mathbb R^{n \times n}$ is a matrix function. Such a calculation simply cannot be done with matrix calculus, since the resulting object will have four indices, while matrix calculus can only "give outputs" in terms of scalars, vectors and matrices. This is crazy limiting, since the theory doesn't even provide a complete solution to differentiating matrices!

We can of course work through the $(\epsilon-\delta)$ formalism to arrive at the answer. However, tensor notation makes this clear and easy (follow the rules of single-variable calculus!) since we are forced to talk about tensors "element-wise":

$$ \frac{ \partial X}{\partial Y} = \partial_{Y^{ij}} X^{ab} $$

which now has four indeces$(i, j, a, b)$. If we now know that $Y = X$ for example, we can plug this information in:

$$ \frac{ \partial X}{\partial Y} = \partial_{Y^{ij}} X^{ab} = \partial_{X^{ij}} X_{ab} = \text{$X^{ij}$ is independent from $X^{ab}$} = \delta_i^a \delta_j^b $$