Why a matrix-by-matrix derivative is actually a tensor?

893 Views Asked by At

For matrices $A$ and $B$, I thought $\frac{\partial A}{\partial B}$ is a matrix $C$ where $C_{ij} = \frac{\partial A_{ij}}{\partial B_{ij}}$.

However, when I use this matrix calculus website, it says

$$\frac{\partial{A}}{\partial{A}} = \mathbb{I}\otimes\mathbb{I}$$

Why is it a tensor? How to select the elements from this tensor to get the kind of derivative $C$ that I thought about?

1

There are 1 best solutions below

0
On BEST ANSWER

In general, $C_{i_1\cdots i_pj_1\cdots j_q}:=\frac{\partial A_{i_1\cdots i_p}}{\partial B_{j_1\cdots j_q}}$ shows the derivative of a rank-$p$ tensor with respect to a rank-$q$ tensor is a rank-$p+q$ tensor. If any of your indices on $A$ match those on $B$, that implies you're not calculating one element of $C$, but rather summing over all values of any repeated index. This is true even if one tensor is proportional to another. Returning to your example with matrices, if a scalar $c$ exists with $A_{ij}=cB_{ij}$ then $\frac{\partial A_{ij}}{\partial B_{kl}}=c\delta_{ik}\delta_{jl}$ (provided the entries in any one matrix are independent, so they aren't e.g. symmetric), but $\frac{\partial A_{ij}}{\partial B_{ij}}=c\delta_{ii}\delta_{jj}$ is $c$ times the number of entries per matrix.