Gradient of a partial differential matrix

37 Views Asked by At

$X$ is a $2$ $\times$ $2$ matrix, $W$ is a $2$ $\times$ $3$ matrix, and $Y=XW$,as we know,$Y$ is a $2$ $\times$ $3$ matrix.Now,Set $L$ is a scalar,and the gradient
$ \frac{\partial L}{\partial Y}=$ $$ \begin{bmatrix} \frac{\partial L}{\partial Y_{11}} & \frac{\partial L}{\partial Y_{12}} & \frac{\partial L}{\partial Y_{13}} \\ \frac{\partial L}{\partial Y_{21}} & \frac{\partial L}{\partial Y_{22}} & \frac{\partial L}{\partial Y_{23}} \\ \end{bmatrix} $$

and by the chain rule,we know that:$\frac{\partial L}{\partial X}=\frac{\partial L}{\partial Y}\frac{\partial Y}{\partial X}$,$\frac{\partial L}{\partial W}=\frac{\partial L}{\partial Y}\frac{\partial Y}{\partial W}$,please show the gradient of $\frac{\partial L}{\partial X}=\frac{\partial L}{\partial Y}W^{T}$

So in fact,i have to prove $W^T=\frac{\partial Y}{\partial X}$, however,the size of $X$ and $Y$ are not the same,it means the number of the element in $X$ and $Y$ are not the same either,in this situation, how to calculate the $\frac{\partial Y}{\partial X}$ ?

1

There are 1 best solutions below

0
On BEST ANSWER

Given the relationship $Y=XW$ and the gradient wrt $Y$ of some function, i.e. $$G=\frac{\partial L}{\partial Y}$$ you must find the gradient wrt $X$.

Rather than using the chain rule, simply expand the differential of the function and then perform a change of variable to obtain the desired result $$\eqalign{ dL &= G:dY \cr &= G:dX\,W \cr &= GW^T:dX \cr \frac{\partial L}{\partial X} &= GW^T \cr }$$ where colon denotes the trace/Frobenius product, i.e. $$A:B = {\rm tr}(A^TB)$$