automatic differentiation on matrix-vector product

251 Views Asked by At

I was trying to derive a general formula for backward automatic differentiation of a matrix-vector product.

The task is basically the following:

$$A\cdot x= b$$

We assume that $\dfrac{dE}{db}$ is given (E is any value which is not in the scope of this question), I want to calculate $\dfrac{dE}{dA}$ and $\dfrac{dx}{dA}$.

I am stuck at writing a general form for $\dfrac{dE}{dA}$.

This is what I did:

$$\dfrac{dE}{dA} = \dfrac{dE}{db} \cdot \dfrac{db}{dA}$$

Writing this down in tensor notation:

$$\dfrac{dE}{db_i}(e_i) \cdot \dfrac{db_j}{dA_{mn}}(e_j \otimes e_m \otimes e_n)$$

But I did not manage to simplify this. Obviously a tensor of rank 2 should be the result.

I guessed the following result:

$$\color{red}{\dfrac{dE}{db_i} \dfrac{db_j}{dA_{mn}} (e_i) \cdot (e_j \otimes e_m \otimes e_n)}$$ $$\color{red}{= \dfrac{dE}{db_i} \dfrac{db_j}{dA_{mn}} \delta_{in} (e_j \otimes e_m)}$$ $$\color{red}{= \dfrac{dE}{db_i} \dfrac{db_j}{dA_{mi}} (e_j \otimes e_m)}$$

I looked through my books to find a suitable solution but only found tensor product where the first tensor has a higher rank than the second one. The problem with my solution is especially that $b$ can be a different size than the second size dimension of $A$.

I am very happy if someone could help me here!

Greetings, Finn

2

There are 2 best solutions below

0
On

Thanks to @user619894 who suggested to use $i=j$, I got to the following:

$$\dfrac{dE}{dA_{mn}} = \sum_i {\dfrac{dE}{db_i} \cdot \dfrac{db_i}{dA_{mn}}}$$

It's important to see that $\dfrac{db_i}{dA_{mn}}$ is only different from $0$ if $m=i$.

Therefor the sum reduces to:

$$\dfrac{dE}{dA_{mn}} = \dfrac{dE}{db_m}\cdot x_n$$

0
On

$ \def\o{{\tt1}} \def\E{{\cal E}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} $Let's use a colon for the Frobenius product, which is a concise notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^n\sum_{j=1}^m A_{ij}B_{ij} \;=\; \trace{A^TB} \\ }$$ We are given the following problem variables $$\eqalign{ b &= Ax &\qiq &db = A\:dx + dA\:x \\ g &= \grad{\E}{b} &&\big({\rm gradient}\big) \\ }$$ Expand the differential of $\E$ and isolate its gradients $$\eqalign{ d\E &= g:db \\ &= g:\LR{A\:dx + dA\:x} \\ &= \c{A^Tg}:{dx} + \c{gx^T}\!:{dA} \\ \grad{\E}{x} &= \c{A^Tg},\qquad\; \c{gx^T} = \grad{\E}{A} \\ \\ }$$


From the definition of the Frobenius product a number of handy formulas follow $$\eqalign{ A:A &= \|A\|_F^2 \\ A:B &= B:A = B^T:A^T \\ C:\LR{AB} &= \LR{CB^T}:A = \LR{A^TC}:B \\ }$$ Applied to vectors (i.e. $\:n\times\o$ matrices) it becomes the standard dot product $$\eqalign{ a:b \;=\; a\cdot b \;=\; a^Tb \\ }$$