$\frac{\partial}{\partial X_{ij}} \sum_k\sum_l\sum_m\sum_n X_{lk}C_{lm}X_{mn}N_{nk}=\sum_m\sum_n C_{im}X_{mn}N_{nj} + \sum_k\sum_l X_{lk}C_{li}N_{jk}$

940 Views Asked by At

Is there any way (i.e. a formula) to compute the derivative of the summation in the second step below without having to separate the summation by cases according to whether particular indices are equal to $i$ or $j$.

$$ \frac{\partial}{\partial X_{ij}} trace\{X^TCXN\} = \frac{\partial}{\partial X_{ij}} \sum_k\sum_l\sum_m\sum_n X_{lk}C_{lm}X_{mn}N_{nk}$$ $$= \sum_m\sum_n C_{im}X_{mn}N_{nj} + \sum_k\sum_l X_{lk}C_{li}N_{jk}$$ $$=(CXN)_{ij} + (C^TXN^T)_{ij}$$

For example how I analysed it was by separating the sum over $k$ into a sum over $k\neq j$ and a sum with $j=k$ and then continued separating summations like this until I could evaluate the derivative.

This worked and I got the correct answer, however it is very time consuming.

Is there a formula I can use when I have sums such as these that can help me compute the derivative quickly without having to separate into so many individual summations? Particularly what makes it difficult is that there are at least two occurrences of the variable $X$ with given indices and the indices summed over for the different occurrences are not the same indices.

The reason I ask is this was presented as a single step in a solution and I'm curious if there is another way besides the way I did it which was very time consuming.

EDIT: For example in computing the derivative of $Tr\{X^TXX^TX\}$ with respect to $X$, which I've been working through, I have eight summations. It would be great if there was a better way.

3

There are 3 best solutions below

0
On

Getting a handle on how to use the total derivative can knock these problems over in a flash.

Let $f \colon M_n(\mathbb{R}) \to M_n(\mathbb{R})$ be the function $f(X) = X^T C X N$. Then its derivative at $X \in M_n(\mathbb{R})$ in the direction $D \in M_n(\mathbb{R})$ can be found as the coefficient of $t$ in $$ \begin{aligned} f(X + tD) &= (X + tM)^T C (X + tD) N \\ &= f(X) + t(D^T C X N + X^T C D N) + t^2(...), \end{aligned} $$ hence $df_X(D) = D^T C X N + X^T C D N$. The derivative of the trace function is $d \operatorname{tr}_X(D) = \operatorname{tr}(D)$, so by the chain rule we get that $$ d(\operatorname{tr} \circ f)_X = \operatorname{tr}(D^T C X N) + \operatorname{tr}(X^T C D N).$$ To take the partial derivative with respect to $X_{ij}$, we really mean to take the total derivative at the point $X$, evaluated on the coordinate matrix $E_{ij}$: $$ \begin{aligned} d(\operatorname{tr} \circ f)_X: E_{ij} &\mapsto \operatorname{tr}(E_{ji} C X N) + \operatorname{tr}(X^T C E_{ij} N) \\ &= (C X N)_{ij} + (N X^T C)_{ij}. \end{aligned}$$

0
On

$ \def\d{\delta}\def\o{{\tt1}}\def\p{\partial} \def\B{\Big}\def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\BR#1{\B(#1\B)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\S#1{\sum_{#1}} \def\c#1{\color{red}{#1}} $You are almost using index notation, i.e. the most powerful way to approach such problems.

The next step is to adopt the Einstein summation convention, wherein a repeated index implies summation over that index, so that explicit summation symbols can be omitted. $$\eqalign{ \S k\S l\S m\S n X_{lk}C_{lm}X_{mn}N_{nk} \qiq X_{lk}C_{lm}X_{mn}N_{nk} }$$ The next thing to master is the behavior of the Kronecker delta under summation $$\eqalign{ A_{ij}\d_{jk} &= A_{ik} &= A_{ki}^T }$$ and the matrix self-gradient $$\eqalign{ \grad{X_{lk}}{X_{ij}} &= {\d_{il}\d_{jk}} }$$ All of this machinery allows the current problem to be dealt with very mechanically $$\eqalign{ \grad{\trace{X^TCXN}}{X_{ij}} &= \grad{\LR{X_{lk}C_{lm}X_{mn}N_{nk}}}{X_{ij}} \\ &= \c{\d_{il}\d_{jk}}C_{lm}X_{mn}N_{nk} + X_{lk}C_{lm}\c{\d_{im}\d_{jn}}N_{nk} \\ &= C_{im}X_{mn}N_{nj} + X_{lk}C_{li}N_{jk} \\ &= C_{im}X_{mn}N_{nj} + C_{il}^TX_{lk}N_{kj}^T \\ &= CXN + C^TXN^T \\ }$$ As well as the quartic problem $$\eqalign{ \grad{\trace{X^TXX^TX}}{X_{ij}} &= \grad{\LR{X_{kn}X_{kl}X_{ml}X_{mn}}}{X_{ij}} \\ \\ &= \c{\d_{ik}\d_{jn}}X_{kl}X_{ml}X_{mn} \\ &+\; X_{kn}\c{\d_{ik}\d_{jl}}X_{ml}X_{mn} \\ &+\; X_{kn}X_{kl}\c{\d_{im}\d_{jl}}X_{mn} \\ &+\; X_{kn}X_{kl}X_{ml}\c{\d_{im}\d_{jn}} \\ \\ &= X_{il}X_{ml}X_{mj} + X_{in}X_{mj}X_{mn} \\ &+\; X_{kn}X_{kj}X_{in} + X_{kj}X_{kl}X_{il} \\ \\ &= X_{il}X_{lm}^TX_{mj} + X_{in}X_{nm}^TX_{mj} \\ &+\; X_{in}X_{nk}^TX_{kj} + X_{il}X_{lk}^TX_{kj} \\ \\ &= 4XX^TX \\ }$$

0
On

Here's a way I found to do this, using differentials, which I think is a better approach than using the Kronecker delta.

For an introduction to how to use differentials for matrix calculus see Matrix Differential Calculus by Magnus and Neudecker.

$$\begin{align}d \space trace(X^TCXN)&= trace\{ d \space (X^TCXN) \}\\ & = trace \{(dX^T)CXN+X^TC(dX)N\} \\ &= trace\{(dX^T)CXN\}+trace\{X^TC(dX)N\} \\ &= trace\{N^TX^TC^T(dX)\}+trace\{NX^TC(dX)\} \\ &= trace\{[N^TX^TC^T+NX^TC](dX)\} \end{align}$$

Therefore in numerator layout:

$$\frac{\partial \space trace(X^TCXN)}{\partial X}=N^TX^TC^T+NX^TC$$

or in denominator layout (taking the transpose):

$$\frac{\partial \space trace(X^TCXN)}{\partial X}=CXN+C^TXN^T$$