Derivation Numerical Method with partial derivatives, vectors, matrices and scalar product

210 Views Asked by At

I need help in finding a way to combine the equations \begin{equation} \frac{\partial J}{\partial W} \cdot \delta W = \langle Y_M^T (\eta^{'}(Y_M W) \odot (\eta(Y_MW)-C)),\delta W \rangle \end{equation}

\begin{equation} \langle\frac{\partial J}{\partial Y_M} , \delta Y_M\rangle_F = \langle (\eta^{'}(Y_MW)\odot (\eta(Y_MW)-C))W^T, \delta Y_M \rangle_F \end{equation}

\begin{equation} \delta Y_M = \frac{\partial Y_M}{\partial K_m} \cdot \delta K_m = \frac{\partial \phi}{\partial Y }(Y_{M-1},K_{M-1})...\frac{\partial \phi}{\partial Y }(Y_{m+1},K_{m+1})\frac{\partial \phi}{\partial K }(Y_{m},K_{m})\delta K_m \end{equation}

to find expressions for \begin{equation} \frac{\partial J}{\partial K_m} \text{, m = 0,...,M-1 and } \frac{\partial J}{\partial W}. \end{equation}

Here $Y_{m+1} = \phi(Ym,Km)$ indicates an ode numerical method step such as Eulers method that updates a n by 4 matrix $Y_m$. $\langle A,B\rangle_F = \sum_{i,j} a_{i,j}b_{ij}$ is the inner product between two matrices of the same size. C is an n by 1 vector.$\eta$ is a function that maps from $R^{a*b}$ to $R^{a*b}$. $W$ is a n by 1 vector. $J = J(K_0,K_1,...,K_{M-1},W)$ maps M 4 by 4 matrices and a 4 vector W into a real number. $\odot$ is the Hadamard product. K is a structure containing M 4*4 matrices (from $m = 0$ to $m= M-1$).

All help is appreciated, everything from suggestions where to start to a complete derivation. My idea is to use total differentials, chain rule and/or product rule.

1

There are 1 best solutions below

3
On BEST ANSWER

Doesn't look like you need too much help, you're mostly done.

If I may introduce a few notations for typing convenience $$\eqalign{ \eta &= \eta(Y_MW) &{\rm\,\,\,[vector]} \cr g &= \eta\odot(\eta-C) \cr {\mathcal A_j} &= \frac{\partial\phi}{\partial Y}(Y_j,K_j) &{\rm\,\,\,[4th\,order\,tensor]}\cr {\mathcal B_j} &= \frac{\partial\phi}{\partial K}(Y_j,K_j) &{\rm\,\,\,[4th\,order\,tensor]}\cr {\mathcal P_{jk}} &= {\mathcal A_{j-1}}:{\mathcal A_{j-2}}:\ldots:{\mathcal A_{k+1}} &{\rm\,\,\,[\,j>k+1\,]} \cr {\mathcal C} &= {\mathcal A}:{\mathcal B} \,\,\,\,\,\,\implies &\,\,{\mathcal C}_{ijmn}=\sum_{kl} {\mathcal A}_{ijkl}{\mathcal B}_{klmn} \cr }$$ Note that the double-contraction product notation (:) can be used with matrices and vectors, as well as tensors. For example $$\eqalign{ X:Y &= \langle X,Y\rangle_F \cr x:y &= x\cdot y \cr\cr }$$ Your first equation can be solved immediately for one of the gradients $$\eqalign{ dJ &= Y^Tg:dW \cr \frac{\partial J}{\partial W} &= Y^Tg \cr\cr }$$ And substituting the third equation $$\eqalign{ dY_m &= {\mathcal P_{Mm}}:{\mathcal B_{m}}:dK_m}$$ into the second $$\eqalign{dJ &= gW^T : dY_M }$$ yields the other gradient $$\eqalign{ dJ &= gW^T :\Big({\mathcal P_{Mm}}:{\mathcal B_{m}}:dK_m\Big) \cr \frac{\partial J}{\partial K_m} &= gW^T : {\mathcal P_{Mm}} : {\mathcal B_{m}} \cr\cr }$$ NB: In my notation, components of the gradient tensors are indexed like so $$ \Big(\frac{\partial\phi}{\partial Y}\Big)_{ijkl} = \frac{\partial\phi_{ij}}{\partial Y_{kl}} $$