I am looking at the Matrix Cookbook. From my real analysis background, my understanding of calculating derivatives involving matrices is to use the Fréchet derivative on the normed space $(\mathbb{R}^{n \times n}, \|\cdot\|_{op})$ and whatever the target space is, but I am having a hard time linking this to what is used in this book.
For example, consider the matrix trace $\text{Tr}: \mathbb{R}^{n \times n} \rightarrow \mathbb{R}$. This is a linear map so the Frechet derivative in direction ${\bf{V}} \in \mathbb{R}^{n \times n}$ is just the linear map itself independent of the point ${\bf{X}} \in \mathbb{R}^{n \times n}$ so
$$\text{d}\text{Tr}({\bf{X}}){\bf{V}} = \text{Tr}({\bf{V}})$$
whereas in the Matrix Cookbook, the following identity is stated
$$\displaystyle\frac{\partial}{\partial {\bf{X}}}\text{Tr}({\bf{X}}) = {\bf{I}}$$
Which I suppose has the same property of being independent of ${\bf{X}}$ but it's not the same. Another example is the function $f({\bf{X}}) = {\bf{X}}^{-1}$, which has Frechet derivative $\text{d}f({\bf{X}}){\bf{V}} = -{\bf{X}}^{-1}{\bf{V}}{\bf{X}}^{-1}$, MC states an extremely similar looking identity: $$\frac{\partial{\bf{X}}^{-1}}{\partial x} = -{\bf{X}}^{-1}\frac{\partial{\bf{X}}}{\partial x}{\bf{X}}^{-1}$$
My question is, what is the definition of $\displaystyle\frac{\partial}{\partial {\bf{X}}}$, $\partial{\bf{X}}$ and exotic expressions such as $\partial{\lambda_i}$ $\partial{\bf{v}}_i$ (where $\lambda_i, {\bf{v}}_i$ are the eigenvalues and vectors of a real symmetric matrix). I am also curious if there a nice geometric interpretation or analogue to derivatives in Banach spaces and why might these specialised derivatives be preferred over a Frechet derivative in applications.
I have found a similar question with some answers here but I did not find these particularly enlightening, any insightful answers are much appreciated.
Notice the disclaimers on the 2nd page:
A collection like that is bound to be sometimes confusing and (self-)contradictory.
1) It seems that when there is an inner product the formulas in MC give you the gradient. That's your first example with trace, see @Rodrigo's comment.
2) When they can't use any inner product to convert it to a gradient, they seem to give you the derivative as if using chain rule and writing $\tfrac{\partial}{\partial X}$ instead of $d$, for example,
$$ d(X^{-1})=-X^{-1}\; dX\; X^{-1} $$ where of course $dX=id$, and so $$ d(X^{-1})(V)=-X^{-1}\; V\; X^{-1} $$
3) And then there are also formulas given in a coordinate-form, with indices etc. or with eigenvalues etc.
I consider it a good raw resource of reference formulas that are not incorrect. I just know that I first need to rewrite or better, re-derive them myself.