Notion of derivative used in Petersen & Pedersen's Matrix Cookbook

672 Views Asked by At

I am looking at the Matrix Cookbook. From my real analysis background, my understanding of calculating derivatives involving matrices is to use the Fréchet derivative on the normed space $(\mathbb{R}^{n \times n}, \|\cdot\|_{op})$ and whatever the target space is, but I am having a hard time linking this to what is used in this book.

For example, consider the matrix trace $\text{Tr}: \mathbb{R}^{n \times n} \rightarrow \mathbb{R}$. This is a linear map so the Frechet derivative in direction ${\bf{V}} \in \mathbb{R}^{n \times n}$ is just the linear map itself independent of the point ${\bf{X}} \in \mathbb{R}^{n \times n}$ so

$$\text{d}\text{Tr}({\bf{X}}){\bf{V}} = \text{Tr}({\bf{V}})$$

whereas in the Matrix Cookbook, the following identity is stated

$$\displaystyle\frac{\partial}{\partial {\bf{X}}}\text{Tr}({\bf{X}}) = {\bf{I}}$$

Which I suppose has the same property of being independent of ${\bf{X}}$ but it's not the same. Another example is the function $f({\bf{X}}) = {\bf{X}}^{-1}$, which has Frechet derivative $\text{d}f({\bf{X}}){\bf{V}} = -{\bf{X}}^{-1}{\bf{V}}{\bf{X}}^{-1}$, MC states an extremely similar looking identity: $$\frac{\partial{\bf{X}}^{-1}}{\partial x} = -{\bf{X}}^{-1}\frac{\partial{\bf{X}}}{\partial x}{\bf{X}}^{-1}$$

My question is, what is the definition of $\displaystyle\frac{\partial}{\partial {\bf{X}}}$, $\partial{\bf{X}}$ and exotic expressions such as $\partial{\lambda_i}$ $\partial{\bf{v}}_i$ (where $\lambda_i, {\bf{v}}_i$ are the eigenvalues and vectors of a real symmetric matrix). I am also curious if there a nice geometric interpretation or analogue to derivatives in Banach spaces and why might these specialised derivatives be preferred over a Frechet derivative in applications.

I have found a similar question with some answers here but I did not find these particularly enlightening, any insightful answers are much appreciated.

1

There are 1 best solutions below

5
On BEST ANSWER

Notice the disclaimers on the 2nd page:

"The project of keeping a large repository of relations involving matrices is naturally ongoing"

"Disclaimer: The identities, approximations and relations presented here were obviously not invented but collected, borrowed and copied from a large amount of sources. These sources include similar but shorter notes found on the internet and appendices in books - see the references for a full list"

A collection like that is bound to be sometimes confusing and (self-)contradictory.

1) It seems that when there is an inner product the formulas in MC give you the gradient. That's your first example with trace, see @Rodrigo's comment.

2) When they can't use any inner product to convert it to a gradient, they seem to give you the derivative as if using chain rule and writing $\tfrac{\partial}{\partial X}$ instead of $d$, for example,

$$ d(X^{-1})=-X^{-1}\; dX\; X^{-1} $$ where of course $dX=id$, and so $$ d(X^{-1})(V)=-X^{-1}\; V\; X^{-1} $$

3) And then there are also formulas given in a coordinate-form, with indices etc. or with eigenvalues etc.

I consider it a good raw resource of reference formulas that are not incorrect. I just know that I first need to rewrite or better, re-derive them myself.