Differentiating vector by matrix, optimization problem

91 Views Asked by At

I am currently writing a simulation, which includes an loss function at the end for optimization.

In order to perform backpropagation I need to calculate some derivatives.

My forward pass is the following function: $$ s(\alpha, \beta)=N - \sum_{i=1}^{N}\prod_{j=1}^{E_{max}}1-sigm(\alpha*I_{i,j}+\beta*D_{i,j}) \\ ´\\ \text{with: } N, E_{max} \in \mathbb{N_0}, \\ \alpha, \beta \in \mathbb{R}\\ I, D \in \mathbb{R}^{N\times E_{max}} $$

1. Is there a more concise way to calculate the row product of a matrix?

While applying the chain rule I ran into the following issue:

$$ f_{i,j}(\alpha,\beta) := sigm(\alpha*I_{i,j}+\beta*D_{i,j})\\ g_{i,j}(f):=sigm(f)\\ m_i(g):= \prod_{j=1}^{E_{max}}1-g_{i,j} \\ s(m):= N-\sum_{i=1}^{N}m_i\\ L(s):= \frac{1}{2}*(y-s)^2 $$ If I want to optimize for $\alpha$, I need to calculate $\frac{dL}{d\alpha}$

So far I got:

$\frac{dL}{ds}=-(y-s)$

$\frac{ds}{dm}=[-1, \dots, -1]$

2. But how can I get $\frac{dm}{dg}$? If I am not mistaken I would have to differentiate a vector function $m$ by a matrix $g$?

3. Is there a better way to get the derivative of $s$ directly?

1

There are 1 best solutions below

1
On BEST ANSWER

$ \def\o{{\large\tt1}} \def\p{{\partial}} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\l{\big(} \def\r{\big)} \def\D{{\rm Diag}} $Rename some existing variables $\big(I\to A,\;N\to n,\;E_{max}\to m\big)$ and define some new variables $$\eqalign{ X &= \alpha A+\beta D \quad&\implies\quad dX = A\,d\alpha \\ F &= {\rm sigma}(X) \quad&\implies\quad dF = (J-F)\odot F\odot dX \\ G &= \log(J-F) \quad&\implies\quad dG = dF\oslash(F-J) \\ p &= \exp(G\o_m) \quad&\implies\quad dp = p\odot(dG\,\o_m) \\ P &= \D(p) \\ }$$ where $\o_n\in{\mathbb R}^{n}$ is the all-ones vector, $\,J=\o_n\o_m^T$ is the all-ones matrix, and the functions $(\log,\exp,{\rm sigma})$ are applied element-wise. The symbols $(\odot,\oslash)$ denote element-wise multiplication and division.

Write the remaining variables in terms of the new variables $$\eqalign{ s &= n - \o_n:p \quad&\implies\quad ds = -\o_n:dp \\ L &= \tfrac 12(s-y)^2 \quad&\implies\quad dL = (s-y)\,ds \qquad\qquad \\ }$$ Now it's just a matter of backsubstitution. $$\eqalign{ dL &= (s-y)\,ds \\ &= (y-s)\;\o_n:dp \\ &= (y-s)\;\o_n:p\odot(dG\,\o_m) \\ &= (y-s)\;p\o_m^T:dG \\ &= (y-s)\;p\o_m^T:dF\oslash(F-J) \\ &= (y-s)\,\l p\o_m^T\r\oslash(F-J):dF \\ &= (y-s)\,\l p\o_m^T\r\oslash(F-J):(J-F)\odot F\odot(A\,d\alpha) \\ &= (s-y)\,\l p\o_m^T\r\odot F:A\,d\alpha \\ &= (s-y)\;PF:A\,d\alpha \\ &= (s-y)\;{\rm Tr}(PFA^T)\,d\alpha \\ \grad{L}{\alpha} &= (s-y)\;{\rm Tr}(PFA^T) \\ }$$ where a colon is being used as a convenient product notation for the trace, e.g. $$\eqalign{ A:B &= \sum_{i=1}^n\sum_{j=1}^m A_{ij}B_{ij} \;=\; {\rm Tr}(AB^T) \\ A:A &= \big\|A\big\|_F^2 \\ }$$ Note that the colon and element-wise products commute with themselves and each other $$\eqalign{ A:B &= B:A \\ A\odot B &= B\odot A \\ (A\odot B):C &= A:(B\odot C) \;=\; \sum_{i=1}^n\sum_{j=1}^m A_{ij}B_{ij}C_{ij} \\ }$$ The key idea is that the logarithm turns the product into a sum (which is easy to write in standard matrix notation) followed by exponentiation. Also, the derivative of the logistic function is well-known.