Vector by Matrix Derivatives in Computational Graphs

Question

Vector by Matrix Derivatives in Computational Graphs

44 Views Asked by Bumbble Comm At 06 Apr 2026 - 4:12

I'm trying to learn how to implement the back propagation pass given a computational graph. I've been following the following guide: https://homepages.inf.ed.ac.uk/htang2/mlg2022/tutorial-3.pdf

In equation (4), page 4 of the PDF, the author writes $\frac{\partial L}{\partial A} = \frac{\partial L}{\partial g} \frac{\partial g}{\partial A} = x^T\frac{\partial L}{\partial g}$ where $g(x,A) = xA$ and $x$ is a row vector.

I'm unsure how the author computed $\frac{\partial g}{\partial A} = x^T$ as implied by the above equation. Since the image of $g(x, A)$ is a row vector and $A$ is a matrix, $\frac{\partial g}{\partial A}$ is a row vector by matrix derivative, which I expect to be a tensor?

Where did the $x^T$ come from?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

$ \def\l{\lambda} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{trace}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\qif{\quad\iff\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $Adhering to a strict naming convention will avoid confusion in these kinds of problems.

Use uppercase/lower Latin letters for matrix/vector variables and Greek letters for scalars. Furthermore, use column vectors by default and denote row vectors with explicit transposes.

For the current problem, rename the variables as follows $$ L,g,x \quad\to\quad \l,\,g^T,x^T $$

And to eliminate silly transposition errors, always use the Frobenius product $(:)$ which has the following properties $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ B:B &= \frob{B}^2 \qquad \{ {\rm Frobenius\;norm} \}\\ A:B &= B:A \;=\; B^T:A^T \\ \LR{XY}:B &= X:\LR{BY^T} \;=\; Y:\LR{X^TB} \\ }$$ Finally, you are correct that $\gradLR{g}{A}$ is a tensor.
The good news is that you never need to calculate it.

Let's apply the above ideas to your question $$\eqalign{ g &= A^Tx \qquad &\{ {\rm write\ }g\ {\rm as\ a\ column\ vector} \} \\ \c{dg} &\c{= dA^Tx} &\{ {\rm differential\ of\ }g{\rm\ in\ term\ of\ }A^T \} \\ p &= \grad{\l}{g} \qquad &\{ {\rm gradient\ of\ \l\ wrt\ }g \} \\ d\l &= p:dg &\{ {\rm differential\ of\ \l\ in\ term\ of\ }g \} \\ &= p:\CLR{dA^Tx} &\{ {\rm rearrange\ this}\ldots \} \\ &= px^T:dA^T &\{ {\rm transpose\ this}\ldots \} \\ &= xp^T:dA\qquad&\{ {\rm differential\ of\ \l\ in\ term\ of\ }A\} \\ \grad{\l}{A} &= xp^T &\{ {\rm gradient\ of\ \l\ wrt\ }A \} \\ }$$ Reverting back to the original (terrible) variable names yields $$\eqalign{ \grad{L}{A} &= x^T\gradLR{L}{g} \quad\qquad\qquad\qquad\qquad\qquad\qquad\qquad }$$

Vector by Matrix Derivatives in Computational Graphs

There are 1 best solutions below

Related Questions in MATRIX-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions