Consider recursive relations $$\mathbf{H}_k=\sigma(\mathbf{Z}_k),\ \mathbf{Z}_k=\mathbf{A}\mathbf{H}_{k-1}\mathbf{W}_k$$ where $\mathbf{Z}_k\in\mathbb{R}^{m\times n_k}$, $\mathbf{A}\in\mathbb{R}^{m\times m}$, $\mathbf{H}_{k-1}\in\mathbb{R}^{m\times o_k}$, $\mathbf{W}_k\in\mathbb{R}^{o_k\times n_k}$, $\sigma$ is an element-wise function, and $1\leq k\leq K$. Also, let $L=f(\mathbf{Z}_K)$ where $L\in \mathbb{R}$, and $f$ is a function $f:\mathbb R^{m\times n_K}\to \mathbb R$
Problem:
Given $\mathbf{E}_K:=\partial L/\partial \mathbf{Z}_K$, derive a relation to compute $\partial L/\partial \mathbf{W}_k$ for $1\leq k \leq K$.
My attempt:
Using the chain rule of calculus, I can write the derivative as $$\frac{\partial L}{\partial \mathbf{W}_{k}}=\frac{\partial L}{\partial \mathbf{Z}_{k}} \frac{\partial \mathbf{Z}_{k}}{\partial \mathbf{W}_{k}}$$ For the first term in the RHS, in this question, we derived the recursive relation $$\mathbf{E}_{k-1}=\mathbf{A}^T \mathbf{E}_{k} \mathbf{W}_{k}^T \odot \mathbf{S}_{k-1}$$ where $\mathbf{S}_k=\sigma'(\mathbf{Z}_{k})$. I can derive the second term in RHS using index notation. Let $${Z_{(k)}}_{i p}=\sum_{j, \ell} A_{i j} {H_{(k-1)}}_{j \ell} {W_{(k)}}_{\ell p}$$ Thus $$\frac{\partial {Z_{(k)}}_{i p}}{\partial {W_{(k)}}_{a b}}=\delta_{b p} \sum_j A_{i j} {H_{(k-1)}}_{j a}$$
Question:
I’m having trouble turning the derivative $\partial {Z_{(k)}}_{i p}/\partial {W_{(k)}}_{a b}$ into matrix representation and deriving a final matrix form for $\partial L/\partial \mathbf{W}_k$ in terms of $\mathbf{E}_k$.
Following Greg's answer¬ations, it holds $$ dL = \mathbf{E}_k: d\mathbf{Z}_k = (\mathbf{A}\mathbf{H}_{k-1})^T\mathbf{E}_k: d\mathbf{W}_k $$ The gradient is computed as $$ \frac{\partial L}{\partial \mathbf{W}_k} = (\mathbf{A}\mathbf{H}_{k-1})^T\mathbf{E}_k $$