Matrix calculus with row vectors

74 Views Asked by At

The rules for matrix calculus I find assume column vectors. Are the rules different for row vectors? (I am having a hard time finding them)

I used them when deriving a formula for backpropagation. \begin{gather*} Y\ =\ XW\ +\ B\\ X=\begin{bmatrix} x_{0} & x_{1} & x_{2} \end{bmatrix} ,\ Y=\begin{bmatrix} y_{0} & y_{1} \end{bmatrix} ,\ W=\begin{bmatrix} w_{00} & w_{01}\\ w_{10} & w_{11}\\ w_{20} & w_{21} \end{bmatrix} ,\ B=\begin{bmatrix} b_{0} & b_{1} \end{bmatrix} \end{gather*}

\begin{gather*} \left(\frac{\partial L}{\partial W}\right)^{T} =\begin{bmatrix} \frac{\partial L}{\partial w_{00}} & \frac{\partial L}{\partial w_{00}}\\ \frac{\partial L}{\partial w_{10}} & \frac{\partial L}{\partial w_{11}}\\ \frac{\partial L}{\partial w_{20}} & \frac{\partial L}{\partial w_{21}} \end{bmatrix} =\begin{bmatrix} \color{red}{\frac{\partial L}{\partial Y}}\frac{\partial Y}{\partial w_{00}} & \color{red}{\frac{\partial L}{\partial Y}}\frac{\partial Y}{\partial w_{01}}\\ \color{red}{\frac{\partial L}{\partial Y}}\frac{\partial Y}{\partial w_{10}} & \color{red}{\frac{\partial L}{\partial Y}}\frac{\partial Y}{\partial w_{11}}\\ \color{red}{\frac{\partial L}{\partial Y}}\frac{\partial Y}{\partial w_{20}} & \color{red}{\frac{\partial L}{\partial Y}}\frac{\partial Y}{\partial w_{21}} \end{bmatrix}\\ \\ Focus\ on\ one\ term:\\ y_{0} \ =\ w_{00} x_{0} +w_{10} x_{1} +w_{20} x_{2} \ +b_{0}\\ y_{1} \ =\ w_{01} x_{0} +w_{11} x_{1} +w_{21} x_{2} +b_{1}\\ \\ \frac{\partial Y}{\partial w_{00}} =\ \begin{bmatrix} \frac{\partial y_{0}}{\partial w_{00}}\\ \frac{\partial y_{1}}{\partial w_{00}} \end{bmatrix} =\begin{bmatrix} x_{0}\\ 0 \end{bmatrix}\\ \color{red}{\frac{\partial L}{\partial Y}}\frac{\partial Y}{\partial w_{00}} =\ \begin{bmatrix} \color{red}{\frac{\partial L}{\partial y_{0}}} & \color{red}{\frac{\partial L}{\partial y_{1}}} \end{bmatrix}\begin{bmatrix} x_{0}\\ 0 \end{bmatrix} =\color{red}{\frac{\partial L}{\partial y_{0}}} \ x_{0} \ +\ \color{red}{\frac{\partial L}{\partial y_{1}}} *\ 0\ =\color{red}{\frac{\partial L}{\partial y_{0}}} \ x_{0}\\ \\ \frac{\partial Y}{\partial w_{10}} =\ \begin{bmatrix} \frac{\partial y_{0}}{\partial w_{10}}\\ \frac{\partial y_{1}}{\partial w_{10}} \end{bmatrix} \ =\begin{bmatrix} x_{1}\\ 0 \end{bmatrix} ,\ \frac{\partial Y}{\partial w_{01}} =\ \begin{bmatrix} \frac{\partial y_{0}}{\partial w_{01}}\\ \frac{\partial y_{1}}{\partial w_{01}} \end{bmatrix} \ =\begin{bmatrix} 0\\ x_{0} \end{bmatrix} ,\ \frac{\partial Y}{\partial w_{11}} =\ \begin{bmatrix} \frac{\partial y_{0}}{\partial w_{11}}\\ \frac{\partial y_{1}}{\partial w_{11}} \end{bmatrix} \ =\begin{bmatrix} 0\\ x_{1} \end{bmatrix} ,\\ \frac{\partial Y}{\partial w_{20}} =\ \begin{bmatrix} \frac{\partial y_{0}}{\partial w_{20}}\\ \frac{\partial y_{1}}{\partial w_{20}} \end{bmatrix} \ =\begin{bmatrix} x_{2}\\ 0 \end{bmatrix} ,\ \frac{\partial Y}{\partial w_{21}} =\ \begin{bmatrix} \frac{\partial y_{0}}{\partial w_{21}}\\ \frac{\partial y_{1}}{\partial w_{21}} \end{bmatrix} \ =\begin{bmatrix} 0\\ x_{2} \end{bmatrix}\\ \\ Finally:\\ \left(\frac{\partial L}{\partial W}\right)^{T} =\begin{bmatrix} \frac{\partial L}{\partial y_{0}} \ x_{0} & \frac{\partial L}{\partial y_{1}} \ x_{0}\\ \frac{\partial L}{\partial y_{0}} \ x_{1} & \frac{\partial L}{\partial y_{1}} \ x_{1}\\ \frac{\partial L}{\partial y_{0}} \ x_{2} & \frac{\partial L}{\partial y_{1}} \ x_{2} \end{bmatrix} =\begin{bmatrix} x_{0}\\ x_{1}\\ x_{2} \end{bmatrix}\begin{bmatrix} \frac{\partial L}{\partial y_{0}} & \frac{\partial L}{\partial y_{1}} \end{bmatrix} =\ X^{T}\color{red}{\frac{\partial L}{\partial Y}} \end{gather*}

However, I am not sure if the final result has the correct shape.

\begin{gather*} \frac{\partial L}{\partial X} =\ \color{red}{\frac{\partial L}{\partial Y}}\frac{\partial Y}{\partial X}\\ \frac{\partial Y}{\partial X} =\begin{bmatrix} \frac{\partial y_{0}}{\partial x_{0}} & \frac{\partial y_{0}}{\partial x_{1}} & \frac{\partial y_{0}}{\partial x_{2}}\\ \frac{\partial y_{1}}{\partial x_{0}} & \frac{\partial y_{1}}{\partial x_{1}} & \frac{\partial y_{1}}{\partial x_{2}} \end{bmatrix} =\begin{bmatrix} w_{00} & w_{10} & w_{20}\\ w_{01} & w_{11} & w_{21} \end{bmatrix} =W^{T}\\ \frac{\partial L}{\partial X} =\ \color{red}{\frac{\partial L}{\partial Y}} W^{T} ,\color{red}{\frac{\partial L}{\partial Y} =}\color{red}{\begin{bmatrix} \color{red}{\frac{\partial L}{\partial y_{0}}} & \color{red}{\frac{\partial L}{\partial y_{1}}} \end{bmatrix}}\\ \\ If\ the\ rules\ were\ simply\ reversed:\\ \color{red}{\frac{\partial L}{\partial Y}}\color{red}{=}\color{red}{\begin{bmatrix} \color{red}{\frac{\partial L}{\partial y_{0}}}\\ \color{red}{\frac{\partial L}{\partial y_{1}}} \end{bmatrix}} ,\frac{\partial Y}{\partial X} =\begin{bmatrix} \frac{\partial y_{0}}{\partial x_{0}} & \frac{\partial y_{1}}{\partial x_{0}}\\ \frac{\partial y_{0}}{\partial x_{1}} & \frac{\partial y_{1}}{\partial x_{1}}\\ \frac{\partial y_{0}}{\partial x_{2}} & \frac{\partial y_{1}}{\partial x_{2}} \end{bmatrix}\\ Then\ the\ dimensions\ for\ \color{red}{\frac{\partial L}{\partial Y}}\frac{\partial Y}{\partial X} \ won't\ match \end{gather*}

1

There are 1 best solutions below

4
On BEST ANSWER

$ \def\o{{\tt1}} \def\BR#1{\Big(#1\Big)} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\m#1{\left[\begin{array}{r}#1\end{array}\right]} \def\c#1{\color{red}{#1}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $Assume that the gradient $\LR{G=\grad LY}$ is known and use it to find the other gradients.

Calculate the differential of the loss function $$\eqalign{ dL &= G:dY \\ &= G:\BR{dX\:W + X\:dW} \\ &= \LR{GW^T}:dX + \LR{X^TG}:dW \\ }$$ Then hold $W$ constant to obtain the gradient with respect to $X,\,$ and vice versa $$\eqalign{ \grad LX &= GW^T \;\qquad\; \grad LW &= X^TG \qquad \\ \\ }$$


In the above, a colon is used to denote the Frobenius product.
It has the following properties $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \frob{A}^2 \qquad \{ {\rm Frobenius\;norm} \}\\ A:B &= B:A \;=\; B^T:A^T \\ C:\LR{AB} &= \LR{CB^T}:A \;=\; \LR{A^TC}:B \\ }$$

Note that the above derivation is quite general. It does not matter if $\,\{B,X,Y\}\,$ are row vectors or column vectors. They could even be matrices.