Prove that $\nabla_{A} (XAY) = Y^{T}X^{T}$

643 Views Asked by At

Equation (3) in Dawen Liang's Some Important Properties for Matrix Calculus is

$$\nabla_{A} (XAY) = Y^{T}X^{T}$$

If you know how this can be derived, then please let me know. Thank you.


Replying To H. H. Rugh

Thanks for the reply, May I explain my derivative? Let suppose, $$ X\in \Re^{m\times n} A\in \Re^{n\times o} Y\in \Re^{o\times p} $$ and $$\widetilde { x } _{ i }^{T}$$ is an ith row vector of matrix X and $$y_{j} $$ is a ith coulmn vector of Y then I guess I can get a derivative of a component of matrix $(XAY)_{ij} $ like this $$\frac { \partial (XAY)_{ ij } }{ \partial A } =\frac { \partial \widetilde { x }_{ i }^{T} Ay_{ j } }{ \partial A } =\widetilde { x }_{ i} y_{ j }^{ T }\in \Re ^{ n\times o } $$ So filnally I made a conclusion like $$\frac { \partial XAY }{ \partial A } =\begin{bmatrix} \widetilde { x } _{ 1 }y_{ 1 }^{ T } & \widetilde { x } _{ 2 }y_{ 1 }^{ T } & \cdots & \widetilde { x } _{ m }y_{ 1 }^{ T } \\ \widetilde { x } _{ 1 }y_{ 2 }^{ T } & \widetilde { x } _{ 2 }y_{ 2 }^{ T } & \cdots & \widetilde { x } _{ m }y_{ 2 }^{ T } \\ \vdots & \vdots & \ddots & \vdots \\ \widetilde { x } _{ 1 }y_{ p }^{ T } & \widetilde { x } _{ 2 }y_{ p }^{ T } & \cdots & \widetilde { x } _{ m }y_{ p }^{ T } \end{bmatrix}\in \Re ^{ pn\times mo }$$ by denominator layout notation. Is my derivation wrong? See also https://en.wikipedia.org/wiki/Matrix_calculus#Other_matrix_derivatives

The reason why I have been asking this is I guess there might be a typo in eq.(3). But I have no confidence. Because I'm not familiar with matrix calculus. So If you have the correct and a detail derivation, and please give that, It would be very helpful to me or if you give an information of documents, which include the derivation, it also good to me. I spend all day to find the derivation of eq.(3) yesterday. Because I guess, if the eq.(3) is one of the basic properties, then many documents may include it, but I couldn't find it at all.

4

There are 4 best solutions below

6
On

Simply you can prove that: $$ \nabla_A \mathrm{tr}(AX)=X^T. $$ You can simply prove that by direct calculation (or by using the linearity of differentiation and finding the derivative only for a matrix $X$ all zero except $x_{ij}$).

Then using this and properties of trace we have: $$ \nabla_A(XAY)=\nabla_A\mathrm{tr}(XAY)=\nabla_A\mathrm{tr}(AYX)=(YX)^T=X^TY^T. $$

5
On

I believe it is wrong and the result should be $X^T Y^T$ (reverse order). Although the cited text uses square matrices for this case, it should also hold for non-square matrices of size, say $m\times n$. Taking the gradient of a scalar with respect to $A$ should yield a matrix of the same size, which is consistent with the definition on the first page of the cited text (and not the claimed result).

For the $(k,l)$'th element you have (summation on repeated indices): $$ \left( \nabla_A (X A Y) \right)_{k,l} = \frac{\partial}{\partial A_{k,l}} X_i A_{ij} Y_j = X_i \delta_{ik} \delta_{jl} Y_l = X_k Y_l = \left(X^T Y^T\right)_{k,l} $$

1
On

Expressed in index notation (and the summation convention), your function is $$F_{il} = X_{ij}A_{jk}Y_{kl}$$ The differential is simply $$dF_{il} = X_{ij}\,dA_{jk}\,Y_{kl}$$ So, assuming that $A$ has no special structure, the gradient is $$\eqalign{ \frac{\partial F_{il}}{\partial A_{ps}} &= X_{ij}(\delta_{jp}\delta_{ks})Y_{kl} \cr &= X_{ip} Y_{sl} \cr }$$ Not sure how to write this result using standard matrix operations.

3
On

Let $f:A\in M_{n,o}\rightarrow XAY\in M_{m,p}$; $f$ has its values in $\mathbb{R}^{mp}$ (and not in $\mathbb{R}$). Thus $f$ admits $mp$ gradient-vectors; more precisely, $\nabla(f)(A)$ is a tensor defined by the $\nabla(f_{i,j})(A)$ when $(i\le m,j\leq p)$ varies.

$f_{i,j}(A)=(XAY)_{i,j}=e_i^TXAYe_j=tr(e_i^TXAYe_j)=tr(Ye_je_i^TXA)=tr(YE_{j,i}XA)=<(YE_{j,i}X)^T,A>$, where $<.>$ is the standard scalar product.

By duality, $\nabla(f_{i,j})(A)=(YE_{j,i}X)^T=X^TE_{i,j}Y^T$.