Matrix product rule

107 Views Asked by At

I have a list of functions $f_1, ..., f_n$ where $f_i: \mathbb{R}^h \to \mathbb{R}^{n_i \times n_{i+1}}$ for $i \in \{1, ..., n-1\}$ and $f_n: \mathbb{R}^{n_n \times 1}$. Essentially, I have a product:

$\begin{align} f_1 (\mathbf{x})f_2 (\mathbf{x})f_3 (\mathbf{x})...f_n (\mathbf{x}) \end{align}$

and I would like to take a derivative with respect to $\mathbf{x} \in \mathbb{R}^h$. In other words, $\frac{\partial}{\partial \mathbf{x}} f_1 (\mathbf{x})f_2 (\mathbf{x})f_3 (\mathbf{x})...f_n (\mathbf{x})$. By calculus, I know that this should involve some product rule, but I am not sure how to express them, because each becomes a Tensor. Any insights would be greatful!

1

There are 1 best solutions below

0
On BEST ANSWER

Given the product of some matrices and a vector $$p = ABCy$$ Calculate the differential, then vectorize, then find the gradient with respect to $x$. $$\eqalign{ dp &= ABC\,dy + AB\,dC\,y + A\,dB\,Cy + dA\,BCy \\ &= ABC\,dy + (y^T\otimes AB)dc + (y^TC^T\otimes A)db + (y^TC^TB^T\otimes I)da \\ \frac{\partial p}{\partial x} &= ABC\frac{\partial y}{\partial x} + (y^T\otimes AB)\frac{\partial c}{\partial x} + (y^TC^T\otimes A)\frac{\partial b}{\partial x} + (y^TC^TB^T\otimes I)\frac{\partial a}{\partial x} \\ }$$ where $\otimes$ is the Kronecker product and $\;a={\rm vec}(A),\,b={\rm vec}(B),\,$etc.

The standard (column-stacking) vectorization formula is $$\eqalign{ F &= ABC \\ {\rm vec}(F) &= (C^T\otimes A)\,{\rm vec}(B) \\ f &= (C^T\otimes A)\,b \\ }$$