Matrix Calculus and Matrix Derivatives

1.5k Views Asked by At

Consider a map $f : \mathbb R^{n\times m} \to \mathbb R^{p \times l}$ between matrix spaces, what is the differential of such a mapping? I looked at a really simple example, $\operatorname{id} : \mathbb R^{n\times n} \to \mathbb R^{n\times n}$ given by $\operatorname{id}(X) = X$. Then (in analogy to the the case $f : \mathbb \to \mathbb R$ oder $f : \mathbb R^n \to \mathbb R^n$) we should have $d\operatorname{id}(A) = I$ for all matrices $A$, where $I$ is the identity matrix (and $d\operatorname{id}$ denotes the differential, i.e. the best linear approximation map).

Now I read about matrix derivatives, for example on Wikipedia the derivate of a mapping $F : M(n,m) \to M(p,q)$ between matrix spaces is said to be: $$ \frac{\partial\mathbf{F}} {\partial\mathbf{X}}= \begin{bmatrix} \frac{\partial\mathbf{F}}{\partial X_{1,1}} & \cdots & \frac{\partial \mathbf{F}}{\partial X_{n,1}}\\ \vdots & \ddots & \vdots\\ \frac{\partial\mathbf{F}}{\partial X_{1,m}} & \cdots & \frac{\partial \mathbf{F}}{\partial X_{n,m}}\\ \end{bmatrix} $$ And also in the Matrix Cookbook the basic formula (on page 8) is written as $$ \frac{\partial X_{kl}}{\partial X_{ij}} = \delta_{ik}\delta_{lj} $$ (where $\delta_{ij}$ denotes the Kronecker delta) and this I guess is essentially the derivation formula for the identity map. So If I apply this on the above map $\operatorname{id} : \mathbb R^{n\times n} \to \mathbb R^{n\times n}$ I get an $4\times 4$ matrix $$ \begin{pmatrix} \frac{\partial X}{\partial x_{11}} & \frac{\partial X}{\partial x_{21}} \\ \frac{\partial X}{\partial x_{12}} & \frac{\partial X}{\partial x_{22}} \end{pmatrix} = \begin{pmatrix} \frac{\partial \begin{pmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{pmatrix}}{\partial x_{11}} & \frac{\partial \begin{pmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{pmatrix}}{\partial x_{21}} \\ \frac{\partial \begin{pmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{pmatrix}}{\partial x_{12}} & \frac{\partial \begin{pmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{pmatrix}}{\partial x_{22}} \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} $$ (where I on the last line had not written out the blockmatrices). But this result is quite different from what I would intuitively expect, so what did I wrong? Maybe I am interpreting all these matrix derivatives wrong, could someone please explain?

2

There are 2 best solutions below

5
On BEST ANSWER

The issue is that you first need to pick a basis before you write out the matrix representation of the derivative. The matrix above doesn't make sense as a derivative.

In general, I find it a little easier to avoid indices, if possible. Dealing with indices and bases can add unnecessarily clutter.

In the above case, we have $F(X) = X$, so $F(X+H) = X+H$, and so we see that $DF(X)(H) = H$.

Note that this is a map $\mathbb{R}^{n \times n} \to \mathbb{R}^{n \times n}$, so if you want to express the derivative as a matrix, you need to pick a basis first. The resulting matrix will be a $n^2 \times n^2$ matrix and it will necessarily be the identity matrix, of course, since $DF(X)(B_k) = B_k$ for the basis elements.

If one uses the inner product induced by the Frobenius norm, then one can write $\nabla F(X) = I$.

If $\phi(X) = [X]_{kl}$, a similar analysis shows that $D \phi(X)(H) = [H]_{kl}$, and to obtain the component along the $E_{ij} = e_ie_j^T$ direction, we look at $D \phi(X)(E_{ij}) = [E_{ij}]_{kl} = \delta_{ik}\delta_{jl}$.

6
On

The derivative of a function $A \rightarrow B$ is a function $A \rightarrow Lin(A,B)$ that for each $x$ in $A$ gives a linear approximation to the function at $x$. So in your case that's $\mathbb{R}^{n\times n} \rightarrow Lin(\mathbb{R}^{n\times n},\mathbb{R}^{n\times n})$. Since your $id$ function is already linear, the approximation at any point is just the function itself. Your block matrix is just the matrix way of writing $id$ itself. The top left 2x2 block says that the top left corner of the result of id(X) is simply equal to the top left corner of X, and similar for the other 2x2 blocks. That block matrix notation is very confusing though (the indices should be transposed too, i,j instead of j,i).