To the experts in linear algebra out here-- my background in linear algebra emphasized doing matrix derivatives by computing the Frechet derivative. Now I have posted a question before:Making sense of matrix derivative formula for determinant of symmetric matrix as a Fréchet derivative?
And others have also observed: Understanding notation of derivatives of a matrix
that the formulae for derivatives in matrix cookbook have some implicit trickery in them.
So my question is, can I derive matrix cookbook formulae in general by following the calculation of a Frechet derivative, or are those formulae not reconcilable ? I know that the ML, Stats and EE communities use it a lot, so I want to make sure I am not missing out on a valuable resource. I also invite the purists to weigh in on this.
Given a map between two Banach spaces one calculates its Frechet derivative, a linear map at a given point. Then in a special case of a functional on an inner-product space one can introduce the gradient vector. An ongoing confusion, on this site too, arises from mixing up derivative and gradient. The book you mention seems to suffer from it too. For example, when they write $\frac {\partial a^Tx}{\partial x}=a$ they don't mean the derivative $D(a^Tx)=a^T$ anymore but the gradient vector coming from the dot product. I guess that converting your honest derivative to the gradient should lead to the reconciliation.