@Steph had kindly answered my other question, but I can't work out the math.
He said that "The correct way to apply chain rule with matrices is to use differentials", and provided the answer to $\partial E \over \partial W_4$.
OK, let's suppose that $\partial E \over \partial A_5$ is known to be $(A_5-R)$, so the answer checked out, no problem.
Now if I want to use the same approach to calculate $\partial E \over \partial W_3$, it should be
$dE={\partial E \over \partial A_5}:dA_5$
$dE=W_4^T{\partial E \over \partial A_5}:dA_4$
$dE=A_3^TW_4^T{\partial E \over \partial A_5}:dW_3$
${\partial E \over \partial W_3}=A_3^TW_4^T(A_5-R)$
The "order" is wrong!
If I want to make it right, then the $A$ has to be in the very front, and the $W$s have to be inserted in the very end for each operation.
Why is that!?
Why the same operation $(dA_5=dA_4W_4)$ will product answers in different positions?
The only "possible", if not "far-fetching", relationship I could find is: Because $A_4$ is "in front", so the answer $(A_4^T)$ will always be in the front, and because $W_4$ is "in the end", so the answer $(W_4^T)$ will always be in the very end.
Is it the right reason, or I'm just thinking too much?
Thank you very much for your help!
$ \def\SSS{\sum_{i=1}^m\sum_{j=1}^n\sum_{k=1}^p} \def\A{A_{ij}} \def\B{B_{ik}} \def\BT{B_{ki}^T} \def\C{C_{kj}} \def\CT{C_{jk}^T} \def\LR#1{\left(#1\right)} \def\BR#1{\Big(#1\Big)} $To extend my comment above, by expanding the various products $$\eqalign{ A:\LR{BC} &= \SSS \A\BR{\B\C} \\ \LR{AC^T}:B &= \SSS \BR{\A\CT}\B \\ \LR{B^TA}:C &= \SSS \BR{\BT\A}\C \\ }$$ it is obvious that the sums on the RHS are all identical, therefore the Frobenius (aka double-dot) products appearing on the LHS are likewise identical.
This equivalence could also be arrived at by considering the properties of the trace function when its matrix argument is transposed and/or cyclically permuted.