Derivative of Matrix, Dimension Change Not the Same Shape.

170 Views Asked by At

When I take the derivative of:

$$ \frac{\partial}{\partial\mathbf{X}} \sum_{i,j}\mathbf{A} \odot log(\frac{\mathbf{A}}{\mathbf{B}^T\mathbf{X}\mathbf{C}}) = \frac{\mathbf{A} \odot \mathbf{K}}{\mathbf{B}^T\mathbf{X}\mathbf{C}} $$

where

$$ \mathbf{K} \leftarrow \sum_j \mathbf{C}^T \otimes \mathbf{B}^T $$ Why does the derivative have a different dimension than X? When I run autodiff I get the same dimension as X.

K has the same dimensions as A. Division is elementwise division. Also since K is the j sums it must be reshaped to have the same dimensions as A, it is also possible to sum along i and reshape to the same dimensions as X.

The process largely follows page 6 and 7 of:

https://www.jjburred.com/research/pdf/jjburred_nmf_updates.pdf

1

There are 1 best solutions below

2
On BEST ANSWER

$ \def\o{{\tt1}}\def\p{\partial} \def\L{\left}\def\R{\right}\def\LR#1{\L(#1\R)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\grad#1#2{\frac{\p #1}{\p #2}} $Write the objective function $\phi$ in terms of the elementwise logarithm, elementwise division, and the matrix inner product. Then calculate its differential and gradient with respect to the $X$ matrix. $$\eqalign{ \phi &= A:\log\LR{\frac{A}{B^TXC}} \\ &= A:\log\LR{A} - A:\log\LR{B^TXC} \\ d\phi &= -A:\LR{\frac{B^TdX\,C}{B^TXC}} \\ &= -\LR{\frac{A}{B^TXC}}:B^TdX\,C \\ &= -B\LR{\frac{A}{B^TXC}}C^T:dX \\ \grad{\phi}{X} &= -B\LR{\frac{A}{B^TXC}}C^T \\ }$$ where $(:)$ denotes the matrix inner product, which can be expressed in terms of the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{AB^T} \\ A:A &= \big\|A\big\|^2_F \\ }$$ Note that this product commutes with elementwise division $(\oslash)$ and multiplication $(\odot)$, e.g. $$\eqalign{ A:\LR{B\odot C} \;=\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij}C_{ij} \;=\; \LR{A\odot B}:C \\ }$$ Also note that the gradient derived above has the same dimensions as $X\,$ (as expected).