Question on matrix over matrix derivation

58 Views Asked by At

Consider recursive relations $$\mathbf{H}_k=\sigma(\mathbf{Z}_k),\ \mathbf{Z}_k=\mathbf{A}\mathbf{H}_{k-1}\mathbf{W}_k$$ where $\mathbf{Z}_k\in\mathbb{R}^{m\times n_k}$, $\mathbf{A}\in\mathbb{R}^{m\times m}$, $\mathbf{H}_{k-1}\in\mathbb{R}^{m\times o_k}$, $\mathbf{W}_k\in\mathbb{R}^{o_k\times n_k}$, $\sigma$ is an element-wise function, and $1\leq k\leq K$. Also, let $L=f(\mathbf{Z}_K)$ where $L\in \mathbb{R}$, and $f$ is a function $f:\mathbb R^{m\times n_K}\to \mathbb R$

Problem:

Given $\mathbf{E}_K:=\partial L/\partial \mathbf{Z}_K$, derive a relation to compute $\partial L/\partial \mathbf{W}_k$ for $1\leq k \leq K$.


My attempt:

Using the chain rule of calculus, I can write the derivative as $$\frac{\partial L}{\partial \mathbf{W}_{k}}=\frac{\partial L}{\partial \mathbf{Z}_{k}} \frac{\partial \mathbf{Z}_{k}}{\partial \mathbf{W}_{k}}$$ For the first term in the RHS, in this question, we derived the recursive relation $$\mathbf{E}_{k-1}=\mathbf{A}^T \mathbf{E}_{k} \mathbf{W}_{k}^T \odot \mathbf{S}_{k-1}$$ where $\mathbf{S}_k=\sigma'(\mathbf{Z}_{k})$. I can derive the second term in RHS using index notation. Let $${Z_{(k)}}_{i p}=\sum_{j, \ell} A_{i j} {H_{(k-1)}}_{j \ell} {W_{(k)}}_{\ell p}$$ Thus $$\frac{\partial {Z_{(k)}}_{i p}}{\partial {W_{(k)}}_{a b}}=\delta_{b p} \sum_j A_{i j} {H_{(k-1)}}_{j a}$$


Question:

I’m having trouble turning the derivative $\partial {Z_{(k)}}_{i p}/\partial {W_{(k)}}_{a b}$ into matrix representation and deriving a final matrix form for $\partial L/\partial \mathbf{W}_k$ in terms of $\mathbf{E}_k$.

1

There are 1 best solutions below

2
On

Following Greg's answer&notations, it holds $$ dL = \mathbf{E}_k: d\mathbf{Z}_k = (\mathbf{A}\mathbf{H}_{k-1})^T\mathbf{E}_k: d\mathbf{W}_k $$ The gradient is computed as $$ \frac{\partial L}{\partial \mathbf{W}_k} = (\mathbf{A}\mathbf{H}_{k-1})^T\mathbf{E}_k $$