How to use Jacobian on chain rule

244 Views Asked by At

I have an affine layer in a neural network doing a matrix multiplication of two metrics, $X \; @ \; W^{T}$ (@ is matmul in Python notation) and I try to calculate the $\frac {\partial L}{ \partial W^T}$ using the chain rule and Jacobian.

As explained in Product of Jacobians and chain Rule, I need to apply the Jacobian $J(f \circ g) = JfJg \quad$, which I believed the same with $\quad J(f \circ g) = Jf \; @ \; Jg$. However, the shape of the matrices do not match to produce the shape (3,2) of $\frac {\partial L}{\partial W^T}$

Please help understand what is wrong.

enter image description here

enter image description here


Please help to understand how to calculate $Jg(W^T) = \frac {\partial Y}{\partial W^T}: \begin{bmatrix} \begin{matrix} \frac{\partial Y}{\partial w_{(m=0,d=0)}} \\ \frac{\partial Y}{\partial w_{(m=0,d=1)}} \\ \frac{\partial Y}{\partial w_{(m=0,d=2)}} \end{matrix} & \begin{matrix} \frac{\partial Y}{\partial w_{(m=1,d=0)}} \\ \frac{\partial Y}{\partial w_{(m=1,d=1)}} \\ \frac{\partial Y}{\partial w_{(m=1,d=2)}} \end{matrix} \end{bmatrix} $.

$ \frac{\partial Y}{\partial w_{(m,d)}}$ has the same shape (2,) with Y, because $w_{(m,d)}$ is scalar. Then it will be like below which has a strange structure.

I believe I misunderstood something and appreciate an advice on what I am missing.

enter image description here