What is $\nabla f$ respect to $B$ and $A$ if
$$f=\|S-ABA^T \|^2$$
where $S\in \mathbb R^{n\times n}$, $A\in \mathbb R^{n\times k}$ and $B\in \mathbb R^{k\times k}$? What is $\nabla f$ respect to only $B$?
What is $\nabla f$ respect to $B$ and $A$ if
$$f=\|S-ABA^T \|^2$$
where $S\in \mathbb R^{n\times n}$, $A\in \mathbb R^{n\times k}$ and $B\in \mathbb R^{k\times k}$? What is $\nabla f$ respect to only $B$?
On
If you use the Euclidean norm for the matrices you have $$n(A)=\Vert A \Vert^2= \text{tr}(A.A^T)$$ whose derivative is $$n(A).h=2 \text{tr}(A^T.h)$$
Now, let's consider $$l(A,B)=S-A.B.A^T.$$ You have $$\nabla_A l(A.B).h =-h.B.A^T-A.B.h^T$$ and $$\nabla_B l(A.B).h =-A.h.A^T$$ You can now apply the chain rule as $$f(A,B)=(n \circ l)(A,B)$$ to get
$$\nabla_A f(A.B).h =-2\text{tr}((S-A.B.A^T)(h.B.A^T+A.B.h^T))$$ and
$$\nabla_B f(A.B).h =-2\text{tr}((S-A.B.A^T)(A.h.A^T))$$
On
First let us recall that the trace of a matrix $X=(X_{ij})_{n\times n}\in \mathbb{R}^{n\times n}$ is $ \mathop{\mbox{trace }}(X)=\sum_{u=1}^nX_{uu}. $ And the Frobenius norm of a matrix $X=(x_{ij})_{n\times n}\in \mathbb{R}^{n\times n}$ is given in terms of the trace $$ \| X\|^2=\mathop{\mbox{trace }}(X^TX) =\mathop{\mbox{trace }}\Big(\big(\sum_{u=1}^{n}X_{iu}\cdot X_{uj}\big)_{n\times n}\Big) =\sum_{v=1}^{n}\big(\sum_{u=1}^{n}X_{vu}\cdot X_{uv}\big). $$ And the Frobenius inner prodoct in matrix linear space $\mathbb{R}^{n\times n}$ is $$ \langle X,Y\rangle = \mathop{\mbox{trace}}(X^TY)=\sum_{i=1}^n\sum_{j=1}^nX_{ij}Y_{ji} $$ In the case of your question we have $$ \| S - ABA^{T}\|^2 = \mathop{\mbox{trace }}((S - ABA^T)^{T}(S - ABA^{T})) $$ Note that \begin{align} (S - ABA^T)^{T}(S - ABA^{T}) = & (S^T - (ABA^T)^T)(S - ABA^T) \\ =& (S^{T} - ABA^{T})(S - ABA^{T}) \\ =& S^TS-SABA^{T}-ABA^{T}S+ABA^{T}ABA^{T} \end{align} Then $$ f(A,B)=\| S-ABA^T\|^2= \mathop{\mbox{trace }}(S^TS) -\mathop{\mbox{trace}}(SABA^T) -\mathop{\mbox{trace}}(ABA^TS) +\mathop{\mbox{trace}}(ABA^TABA^T) $$ Fix $A$ and $B$. By definition, the gradient $\nabla_Bf(A,B)$ of aplication $f$ in $(A,B)$ with respect $B$ is the matrix of the linear application derivative $$ \begin{array}{rrcl} D_Bf(A,B) &: \mathbb{R}^{k\times n}&\longrightarrow &\mathbb{R} \\ \quad & V&\longmapsto & \langle\nabla_Bf(A,B), V\rangle \end{array} $$ that satisfies the condition $$ f(A+V,B)-f(A,B)=\langle \nabla_B f(A_0,B_0), V\rangle+ \|V\|\cdot r(V) $$ for some function $r:\mathbb{R}^{k\times n}\to \mathbb{R}$such that $\lim_{V\to 0}r(V)=0$.
\begin{align} f(A,B+V)-f(A,B) =& -\mathop{\mbox{trace }}(SA(B+V)A^T)+\mathop{\mbox{trace }}(SABA^T) \\ & -\mathop{\mbox{trace }}(A(B+V)A^TS)+\mathop{\mbox{trace }}(ABA^TS) \\ & +\mathop{\mbox{trace }}(A(B+V)A^TA(B+V)BA^T)-\mathop{\mbox{trace }}(ABA^TABA^T) \\ =& -\mathop{\mbox{trace }}(SAVA^T) \\ & -\mathop{\mbox{trace }}(AVA^TS) \\ & +\mathop{\mbox{trace }}(AVA^TABA^T) +\mathop{\mbox{trace }}(ABA^TAVA^T) +\mathop{\mbox{trace }}(AVA^TAVA^T) \\ \end{align} Now we join the portions which are linear function of $V$: \begin{align} -\mathop{\mbox{trace }}(SAVA^T)=& -\mathop{\mbox{trace }}(A^TSAV) , \\ -\mathop{\mbox{trace }}(AVA^TS)=& -\mathop{\mbox{trace }}(A^TSAV) , \\ \mathop{\mbox{trace }}(AVA^TABA^T)=&\mathop{\mbox{trace }}(A^TABA^TAV), \\ \mathop{\mbox{trace }}(ABA^TAVA^T)=&\mathop{\mbox{trace }}(A^TABA^TAV) , \end{align} Equalities above, following the fact that the trace is invariant under cyclic permutations. Then we use the trace linearity for equality $\mathop{\mbox{trace }}(AVA^TAVA^T)= \frac{1}{\|V\|}\cdot\mathop{\mbox{trace }}(A\frac{1}{\|V\|}VA^TAVA^T)$. Thus, we get \begin{align} f(A,B+V)-f(A,B) = & \underbrace{\mathop{\mbox{trace }}(2(-A^TSA+A^TABA^TA)V)}_{\langle\nabla_{B}f(A,B), V\rangle} \\ & +\|V\|\cdot\underbrace{\mathop{\mbox{trace }}(A\frac{1}{\|V\|}VA^TAVA^T)}_{r(V)} \end{align} Then $$ \nabla_{B}f(A,B)=2(-A^TSA+A^TABA^TA)=-2A^T(S-ABA^T)A $$
The calculation of the gradient $\nabla_{(A,B)}f(A_0,B_0)$ with respect to variable $(A,B)$ and the rest $r(H,V)$ of the difference $f(A_0+H,B_0+V)-f(A_0,B_0)$ is a little more laborious. But the method is analogous.
For convenience, let $M = (ABA^T - S)$.
Now express the function and its differential in terms of the Frobenius (:) product $$\eqalign{ f &= M:M \cr\cr df &= 2\,M:dM \cr &= 2\,M:(A\,dB\,A^T) + 2\,M:(dA\,BA^T) + 2\,M:(AB\,dA^T) \cr &= 2\,A^TMA:dB + 2\,(MAB^T+M^TAB):dA \cr }$$ Setting $dA=0$ yields the gradient with respect to $B$ $$\eqalign{ \frac{\partial f}{\partial B} = 2\,A^TMA \cr }$$ And setting $dB=0$ yields $$\eqalign{ \frac{\partial f}{\partial A} = 2\,MAB^T + 2\,M^TAB \cr }$$