Using the matrixes:
$$A, D, G \in \mathbb{R}_{N \times M}$$
$$\iota = \begin{bmatrix} 1\\ ⋮\\ 1 \end{bmatrix}_{N \times 1}$$
$$B, C, E, F \in \mathbb{R}_{M \times 1}$$
I want to calculate the derivatives of the following equations:
$$ D = (A - \iota \cdot B^T) \odot \iota \cdot C^T \\ G = D \odot (\iota \cdot E^T) + \iota \cdot F^T $$
Where $\cdot$ is dot product, and $\odot$ is element wise multiply.
My understanding is that these are all the possible partial derivatives:
$$ \frac{\partial D}{\partial A}, \frac{\partial D}{\partial \iota}, \frac{\partial D}{\partial B}, \frac{\partial D}{\partial C}, \frac{\partial G}{\partial D}, \frac{\partial G}{\partial \iota}, \frac{\partial G}{\partial E}, \frac{\partial G}{\partial F} $$
Now, I am given $\frac{\partial L}{\partial G}$. I want to calculate these partial derivatives: $$ \frac{\partial L}{\partial \iota}, \frac{\partial L}{\partial A}, \frac{\partial L}{\partial B}, \frac{\partial L}{\partial C}, \frac{\partial L}{\partial D}, \frac{\partial L}{\partial E}, \frac{\partial L}{\partial F} $$ So far, I managed to get...
$$ \frac{\partial L}{\partial F} = \left( \frac{\partial L}{\partial G} \right)^T \cdot \frac{\partial G}{\partial F} $$
$$ \frac{\partial L}{\partial D} = \frac{\partial L}{\partial G} \odot \frac{\partial G}{\partial D} $$
I think we can ignore $\frac{\partial L}{\partial \iota}$ because the values of $\iota$ are constant. But then I am immediately stuck on $\frac{\partial L}{\partial E}$ and the other components.
How do I solve for the remaining four partials -- $\frac{\partial L}{\partial A}, \frac{\partial L}{\partial B}, \frac{\partial L}{\partial C}, \frac{\partial L}{\partial E}$ ?
$ \def\l{\lambda}\def\o{{\iota}}\def\p{\partial} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\vecc#1{\operatorname{vec}\LR{#1}} \def\diag#1{\operatorname{diag}\LR{#1}} \def\Diag#1{\operatorname{Diag}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\gg{\LR{\grad{\l}{G}}} \def\ga{\LR{\grad{\l}{A}}} $Let's use a convention wherein an uppercase letter denotes a matrix, a lowercase letter a vector, and a Greek letter a scalar. This means renaming the following problem variables $$\big\{B,C,E,F\big\}\to \big\{b,c,e,f\big\}$$ because we'll need to use those uppercase letters to denote diagonal matrices whose main diagonals are the lowercase letters, i.e. $$\eqalign{ B = \Diag{b},\quad C = \Diag{c},\quad E = \Diag{e},\quad I = \Diag{\o} = {\it Identity\;Matrix} }$$ Diagonal matrices can replace Hadamard products via the following rule $$\eqalign{ M\odot\LR{b\cdot c^T} &= B\cdot M\cdot C \\ }$$ Therefore $$\eqalign{ D &= {A\cdot C-\o\cdot b^T\cdot C} \\ G &= {D\cdot E-\o\cdot f^T} \\ }$$ Finally, let's use a colon to denote the Frobenius product $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A\cdot B^T} \\ A:A &= \big\|A\big\|^2_F \\ }$$ This is also called the double-dot or double contraction product.
When applied to vectors $(n=\tt1)$ it reduces to an ordinary dot product.
The properties of the underlying trace function allow the terms in such a product to be rearranged in many different but equivalent ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:\LR{A\cdot B} &= \LR{C\cdot B^T}:A = \LR{A^T\cdot C}:B \\\\ }$$
Use the given gradient to write the differential of the function in terms of $G$, then change the independent variable from $G\to D\to A$, then recover the gradient wrt $A$. $$\eqalign{ d\l &= \gg:dG \\ &= \gg:\LR{dD\cdot E} \\ &= \LR{\gg\cdot E}:{dD} \\ &= \LR{\gg\cdot E}:\LR{dA\cdot C} \\ &= \LR{\gg\cdot E\cdot C}:{dA} \\ \ga &= \gg\cdot E\cdot C \;\;\doteq\; \gg\odot\LR{\o\cdot e^T}\odot\LR{\o\cdot c^T} \\ }$$ The other gradients can be calculated in a similar fashion.