I need to calculate the Gradient of the scalar $y^Ty$, with respect to vector $x$: $$y= (((R((G^T((Exl^T)\circ P))\circ W))\circ S)\circ H)l$$ $$\frac{\partial y^Ty}{\partial x}=\frac{2}{u^Tu}jj^Tx?$$ where $y$, $x$ and $l$ are vectors and the rest of the terms are matrices:
*Dimensions: R:(g * g), G:(e* g), E(e*k), x(1*k), l(p*1), P(e*p), W,S,H(g *p)
Following this elegant solution involving traces, it can be shown that:
$$u^Ty=j:x \implies u^Ty=j^Tx$$ where: $$j=E^T((G((R^T((ul^T)\circ H\circ S))\circ W))\circ P)l$$
I am using a little trick to calculate the Gradient of $y^Ty$ with respect to $x$:
$$y^Tyu^Tu=yy^T:u^Tu = u^Tyy^T:u^T=u^Ty:u^Ty \implies$$ $$y^Tyu^Tu=(J^Tx)^Tj^Tx=x^Tjj^Tx \implies$$ $$y^Ty=\frac{1}{u^Tu}x^Tjj^Tx\implies$$ $$\frac{\partial y^Ty}{\partial x}=\frac{2}{u^Tu}jj^Tx$$
I am getting to the same solution, applying product rule:
$$\partial (y^Ty)=\partial (y^T)y+(y^T(\partial (y))^T)^T$$ $$\frac{\partial y^Tu}{\partial x}=\frac{\partial u^Ty}{\partial x}=j$$ $$\partial (y^Tu)=\partial (y^T)u+(y^T(\partial (u))^T)^T \implies \frac{\partial(y^T)}{\partial x}=ju^T(uu^T)^+$$ $$\partial (u^Ty)=\partial (u^T)y+(u^T(\partial (y))^T)^T \implies \frac{\partial(y)}{\partial x}=ju^T(uu^T)^+$$ I believe we can write $y$ as: $$y=(uu^T)^+uj^Tx$$ From the above equations, I get again: $$\frac{\partial y^Ty}{\partial x}=\frac{2}{u^Tu}jj^Tx$$
I think that I am doing sth wrong, because combining this derivation with a couple of others, I end up getting many solutions, despite having more equations than unknowns. My hunch is that where I have $jj^T$, I should be having a full rank $(k)$ matrix instead. Any ideas are welcome! Thanks.
(I had a typo in the dimensions of $E$; Apologies if this caused trouble to anyone before fixed)
Vectorization followed by diagonalization will prove to be a useful operation, so let's create a notation for it. $$\eqalign{ {\cal P} &= {\rm Diag}\Big({\rm vec}(P)\Big) \\ {\cal W} &= {\rm Diag}\Big({\rm vec}(W)\Big) \\ }$$ Define the matrices $$\eqalign{ A &= Exl^T\odot P \\ B &= W\odot G^TA \\ C &= S\odot H \\ }$$ Then consider their factorizations/vectorizations. $$\eqalign{ C &= \sum_k \sigma_ku_kv_k^T, \quad U_k = {\rm Diag}(\sigma_ku_k),\; V_k = {\rm Diag}(v_k) \\ C\odot Z &= \sum_kU_kZV_k \\ a &= {\rm vec}(A) \\ &= {\rm vec}(P)\odot{\rm vec}(Exl^T) \\ &= {\rm vec}(P)\odot(l\otimes E)x \\ &= {\cal P} (l\otimes E)x \\ b &= {\rm vec}(B) \\ &= {\rm vec}(W)\odot{\rm vec}(G^TA) \\ &= {\cal W}(I\otimes G^T)a \\ &= {\cal W}(I\otimes G^T){\cal P} (l\otimes E)x \\ &= Qx \\ }$$ Use all of this to write $y$ in a nicer form. $$\eqalign{ y &= (S\odot H\odot R(W\odot(G^T(Exl^T\odot P))))l \\ &= (S\odot H\odot R(W\odot(G^TA)))l \\ &= (C\odot RB)l \\ &= \sum_k {\rm vec}\Big(U_k RB V_k l\Big) = \sum_k (l^TV_k\otimes U_kR)\,b \\ &= Jb \\&= JQx \\ }$$ With this nice expression for $y$, finding the gradient of the objective function is simple. $$\eqalign{ \phi &= y^Ty \\ &= y:y \\ d\phi &= 2y:dy \\ &= 2y:JQ\,dx \\ &= 2\,Q^TJ^Ty:dx \\ \frac{\partial \phi}{\partial x} &= 2\,Q^TJ^Ty \\ &= 2\,(l^T\otimes E^T)\;{\cal P}\;(I\otimes G)\;{\cal W}\; \sum_k\Big(V_kl\otimes R^TU_k\Big)\;y \\ }$$ To relate this to my previous comments, note that $\;M=JQ$.
Update
The variables $(U_k,V_k)$ can be removed from the final expression since $$\eqalign{ \sum_k\Big(V_kl\otimes R^TU_k\Big)\;y &= {\rm vec}\bigg(\sum_k R^TU_kyl^TV_k\bigg) \\ &= {\rm vec}\Big(R^T\big(yl^T\odot C\big)\Big) = {\rm vec}\Big(R^TYCL\Big) \\ &{where}\quad Y = {\rm Diag}(y),\; L = {\rm Diag}(l) \\ }$$ This means that you don't need to perform an SVD decompostion to utilize the result.