The derivate formula in Matrix Cookbook

265 Views Asked by At

As said in the book, The basic assumptions about matrix derivates can be written in a formula as

$ \frac{∂X_{kl}}{∂X_{ij}} = δ_{ik}δ_{lj} $

But I don't know how I can use this formula to calculate matrix derivates. Could anyone give some examples?

1

There are 1 best solutions below

0
On BEST ANSWER

A more intuitive description can be seen if you draw analogy with a standard derivative, of a function $f(x,y) = \alpha\beta$, $$ \frac{\partial}{\partial x\partial y} \alpha\beta = 1 \qquad \text{if}\ \alpha\beta = xy\\ \frac{\partial}{\partial x\partial y} \alpha\beta = 0 \qquad \text{if}\ \alpha\beta \ne xy $$ You can combine this by using the kroneka-delta as, $$ \frac{\partial}{\partial x\partial y} \alpha\beta = \delta_{\alpha x}\delta_{\beta y} $$

I then take this following example from Physics as demonstration. Consider the following (Lagrangian) where we have implied Einstein summation across indices.

$$ \mathcal{L} = \tfrac{1}{2} \left(\partial_{\mu} \varphi_{\nu}\right) \left(\partial_{\mu} \varphi_{\nu}\right) - \tfrac{1}{2}m^2\varphi_\mu\varphi_\mu $$

If we take a derivative with respect to $\left(\partial_\alpha\varphi_\sigma\right)$, as in the Equations of Motion, which can be put into the following form, $\left(\partial\varphi\right)_{\alpha\sigma}$ to be consistent with your question. Then the derivative explicitly,

$$ \frac{\partial\mathcal{L}} {\partial\left(\partial_\alpha\varphi_\sigma\right)} = \left(\partial_{\mu} \varphi_{\nu}\right) \left( \frac{\partial}{\partial\left(\partial_\alpha\varphi_\sigma\right)} \partial_{\mu} \varphi_{\nu}\right) + \left( \frac{\partial}{\partial\left(\partial_\alpha\varphi_\sigma\right)} \partial_{\mu} \varphi_{\nu}\right) \left(\partial_{\mu} \varphi_{\nu}\right) $$ Then just focusing on the derivative itself,

$$ \frac{\partial}{\partial\left(\partial_\alpha\varphi_\sigma\right)} \partial_{\mu} \varphi_{\nu} = \delta_{\alpha\mu}\delta_{\sigma\nu} $$

We can see how this interacts with the remaining term that was left untouched as part of the product rule, $$ \left( \frac{\partial}{\partial\left(\partial_\alpha\varphi_\sigma\right)} \partial_{\mu} \varphi_{\nu}\right) \left(\partial_{\mu} \varphi_{\nu}\right) = \delta_{\alpha\mu}\delta_{\sigma\nu} \partial_{\mu} \varphi_{\nu} = \partial_{\alpha} \varphi_{\sigma} $$ As a side note then the full derivative is then, $$ \frac{\partial\mathcal{L}} {\partial\left(\partial_\alpha\varphi_\sigma\right)} = \partial_{\alpha} \varphi_{\sigma} $$ Note: I have taken the metric to be Euclidean so $g^{\mu\nu}=1$ on the diagonal else 0. If you want to use a Minkowski metric such as $(1,-1,-1,-1)$ then you have to be more careful.