Chain rule for partial matrix dervative.

109 Views Asked by At

Let $f: \mathbb{R}^a \rightarrow \mathbb{R}$ and $z: \mathbb{R}^b \rightarrow \mathbb{R}^a$ functions of vectors.

I know that computing the partial derivate $\frac{\partial f}{\partial x}$ is computed using the chain rule $$\frac{\partial f}{\partial x} = \sum_i \frac{\partial f}{\partial z_i}\frac{\partial z_i}{\partial x}$$

Does this generalize to matrices? That means, do I need to sum over all entries in the matrix

$$\frac{\partial f}{\partial x} = \sum_{i,j} \frac{\partial f}{\partial z_{i,j}}\frac{\partial z_{i,j}}{\partial x}$$

assuming $z$ and $x$ are defined on some real matrix space?

3

There are 3 best solutions below

0
On BEST ANSWER

There's something a little wonky with your formula. I think it should be $$\frac{\partial (f\circ z)}{\partial x_k} =\sum_i \frac{\partial f}{\partial z_i} \frac{\partial z_i}{\partial x_k}$$ To answer your question though, yes it should be the same because when we take matrix derivatives we are implicitly identifying $\text{Mat}_{n\times m}(\mathbb{R})$ with $\mathbb{R}^{nm}$ and taking the derivative in the euclidean space.

0
On

Do you mean that the space $\mathbb R^a$ is replaced by something like $M_{n, m}(\mathbb R)$, the space of $n \times m$ matrices with real entries? Then yes -- as far as calculus is concerned, $M_{n, m}(\mathbb R) = \mathbb R^{n \times m}$, and you just iterate over the components of $\mathbb R^{n \times m}$, which correspond to the matrix entries of the matrices in $M_{n, m}(\mathbb R)$.

0
On

As the other answers mention, yes everything works as intended, and the reason is that in some sense you are still working within the realm of multivariate calculus. Even though everything has technically been expressed in terms of matrices, if you dive into the calculus being done then effectively the matrices just represent a fancy indexing scheme for your variables.

Using a notation slightly more suggestive of the operations you are doing, you could even say $$\frac{\partial f}{\partial x}=\frac{\partial f}{\partial z}\frac{\partial z}{\partial x}.$$ The product of the matrix $\frac{\partial f}{\partial z}$ with the tensor $\frac{\partial z}{\partial x}$ will have the intended behavior.

If these are the kinds of questions you find yourself asking lately, I would highly recommend the Matrix Cookbook. You can buy a physical copy, or you can reference the *.pdf the authors host. They cover more kinds of vector and matrix derivatives than you can shake a stick at.