matrix derivation

91 Views Asked by At

Given $$ \mathbf{Z}=\mathbf{A}\mathbf{H}\mathbf{W} $$ where $\mathbf{Z}\in\mathbb{R}^{V\times d_n}$, $\mathbf{A}\in\mathbb{R}^{V\times V}$, $\mathbf{H}\in\mathbb{R}^{V\times d_{n-1}}$, and $\mathbf{W}\in\mathbb{R}^{d_{n-1}\times d_n}$, what is the matrix representation of $$\frac{\partial \mathbf{Z}}{\partial \mathbf{H}}$$ ? My understanding is that it should be $\mathbf{A}\mathbf{W}$, but their dimensions don't match.

3

There are 3 best solutions below

0
On BEST ANSWER

You may be confused about how to define $\partial \mathbf{Z}/\partial \mathbf{H}$. One reasonable definition is the tensor whose $(i,j,k,\ell)$-th entry is the derivative of $Z_{i,\ell}$ with respect to $H_{j,k}$. Next, note that $$Z_{i,\ell}=\sum_{j,k}A_{i,j}H_{j,k}W_{k,\ell}.$$ It follows that $(\partial Z_{i, \ell}/\partial H_{j, k})=A_{i,j}W_{k,\ell}$. Or, more succinctly, using the outer product, $$ \frac{\partial \mathbf{Z}}{\partial \mathbf{H}} =\boxed{\mathbf{A}\otimes \mathbf{W}}. $$

0
On

$ \def\bbR#1{{\mathbb R}^{#1}} \def\e{\varepsilon} \def\o{{\tt1}}\def\p{\partial} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} \def\H{H_{kl}}\def\Z{Z_{ij}} $Rename the dimensioning variables $\:(V,d_{n},d_{n-1})\to(m,n,p)$
and introduce the Cartesian basis vectors $\big\{e_k\in\bbR{m}|\,f_k\in\bbR{n}|\,g_k\in\bbR{p}\big\}$

The gradient of the matrix $H$ with respect to its own components is $$\eqalign{ \grad{H}{\H} = e_kg_l^T \\ }$$ which can be used to calculate the component-wise gradients of $Z$ $$\eqalign{ Z &= AHW \qiq \grad{Z}{\H} &= A\,e_kg_l^TW \\ }$$ The fully indexed gradient is found by extracting the $(i,j)$ components $$\eqalign{ \grad{\Z}{\H} &= \grad{(e_i^TZf_j)}{\H} &= e_i^T\LR{Ae_kg_l^TW}f_j &= A_{ik}W_{lj} \\ }$$


Another technique is to use a Kronecker product $(\otimes)$ to flatten $(Z,H)$ into vectors $(z,h)$ before calculating the gradient $$\eqalign{ z &= {\rm vec}(Z) = \LR{W^T\otimes A}h \qiq \grad zh &= \LR{W^T\otimes A} \\ }$$

0
On

The total differential $DZ(H; {-})$ of a function $Z(H)$ is the best linear approximation of $Z$ at $H$. But $Z$ is already linear, so $$ DZ(H; K) = AKW. $$ $K$ has the same dimensions as $H$. In index notation, writing matrices as $(1,1)$-tensors gives $$ DZ(H)^{ij}_{kl} = A^i_lW^j_k. $$ The RHS is just the tensor product of $A$ and $W$, so we could write $$ DZ(H) = A\otimes W, $$ but the problem with this notation (and the index notation) is that it isn't clear how $DZ(H)$ is supposed to be used.