Consider the matrix equation $l=\mathbf x^\top\mathbf M \left(\mathbf M^\top \mathbf M+\mathbf I\right)^{-1}\mathbf M^\top\mathbf x$ and we need to find the derivative wrt matrix $\mathbf M$.
My idea was to use the chain rule and rewrite the inverse as $\mathbf D=\left(\mathbf M^\top \mathbf M+\mathbf I\right)^{-1}$. Then using eq. 82 of matrix cookbook $\frac{\partial}{\partial \mathbf M}\left[\mathbf x^\top\mathbf M \mathbf D\mathbf M^\top\mathbf x\right]=\mathbf D^\top\mathbf M^\top\mathbf x\mathbf x^\top+\mathbf D\mathbf M^\top\mathbf x\mathbf x^\top = 2\mathbf D\mathbf M^\top\mathbf x\mathbf x^\top$
Then the derivative of $\mathbf D$: $\frac{\partial\mathbf D}{\partial \mathbf M}=-\mathbf D2\mathbf M\mathbf D$, where the $2\mathbf M$ are the derivative of $\mathbf M^\top \mathbf M+\mathbf I$
Combining: $\frac{\partial l}{\partial \mathbf M}=\left[2\mathbf D\mathbf M^\top\mathbf x\mathbf x^\top\right]^\top\left[-\mathbf D2\mathbf M\mathbf D\right]=-4\mathbf x\mathbf x^\top\mathbf M \mathbf D\mathbf D\mathbf M\mathbf D$
I know that this is not correct but I can't see where my mistake is. Perhaps in the calculations of the derivative of the inverse?
$ \def\l{\ell} \def\o{{\tt1}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\sym#1{\op{sym}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} $For typing convenience, define the symmetric matrix $$X=xx^T$$ and the Frobenius $(:)$ product $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \frob{A}^2 \qquad \{ {\rm Frobenius\;norm} \}\\ A:B &= B:A \;=\; B^T:A^T \\ \LR{AB}:C &= A:\LR{CB^T} \;=\; B:\LR{A^TC} \\ }$$ Then invoke the well-known rule for differentiating matrix inverses $$\eqalign{ D &= \LR{M^TM+I}^{-1} \\ dD &= -\LR{D\:\c{dD^{-1}}D} \;=\; -D\CLR{dM^TM+M^TdM}D }$$ Now calculate the gradient of the objective function $$\eqalign{ \l &= X:MDM^T \\ d\l &= X:\LR{\c{dM}\,DM^T + MD\,\c{dM^T} + M\,\c{dD}\,M^T} \\ &= XMD:\c{dM} + DM^TX:\c{dM^T} + M^TXM:\c{dD} \\ &= 2XMD:\c{dM} - M^TXM:\c{D\CLR{dM^TM+M^TdM}D} \\ &= 2XMD:\c{dM} - 2DM^TXMD:\CLR{M^TdM} \\ &= 2\LR{XMD - MDM^TXMD}:\c{dM} \\ &= 2\LR{I - MDM^T}XMD:\c{dM} \\ \grad{\l}{M} &= 2\LR{I - MDM^T}XMD \\ }$$ $\sf NB\!:$ Since $(D,X)$ are symmetric, I have omitted their transposes in several of the steps above.