Partial Derivative of Matrix Summation with respect to a component.

23 Views Asked by At

I am trying to take the partial derivative of the following expression with respect to the component $[A]_{i, j}$, where both matrices $A$ and $B$ are $d\times d$, and $B$ is a diagonal matrix (this is the formula for a specific component corresponding to the product $ABAA$, simplified since $B$ is diagonal).

$$ \sum_{k=1}^{d} \sum_{l=1}^{d} [A]_{i, k} [B]_{k, k}[A]_{k, l}[A]_{l, j} $$ I know that I need to isolate which terms in the double summation contain $[A]_{i, j}$ and ignore the rest. However, the many indices have confused me. As far as I can see, the terms containing $[A]_{i, j}$ are the ones for which one of the following conditions hold:

  1. $k=i$ and $l=j$. In this scenario, then the partial derivative of the double summation is just $[A]_{i, i}[B]_{i, i} [A]_{j, j}$.
  2. $k=j$ and $l$ can be anything. In this case, if $l \neq i$, then we get the partial derivative of the summation is $\sum_{l \neq i}^{d} [B]_{j, j}[A]_{j, l}[A]_{l, j}$ since the only place where $[A]_{i, j}$ appears is the first component. But if $l = i$, then we get $2[A]_{i, j}[B]_{j, j} [A]_{j, i}$ since both the first and last components would be $[A]_{i, j}$.
  3. $l=i$ and $k$ can be anything. In this case, if $k \neq j$, then we get the partial derivative of the summation is $\sum_{k \neq j}^{d} [A]_{i, k}[B]_{k, k}[A]_{k, i}$ since the term only appears at the final component. If $k = j$, then we get $2 [A]_{i, j}[B]_{j, j} [A]_{j, i}$.

My confusion comes from trying to figure out how to combine all these scenarios together into a nice form, since there is overlap between the 3 cases.

1

There are 1 best solutions below

0
On

$ \def\o{{\tt1}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\diag#1{\op{diag}\LR{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\Aij{A_{ij}} \def\Eij{{\widehat E}_{ij}} $First recall that the gradient of a matrix wrt its $(i,j)^{th}$ component is a matrix $\Eij$ whose $(i,j)$ component equals $\o$ and all other components equal $0$ $$\eqalign{ \grad A{\Aij} &= \Eij \\ }$$ Differentiating your matrix product yields $$\eqalign{ P &= ABAA \\ \grad P{\Aij} &= \Eij BAA + AB\Eij A + ABA\Eij \\ }$$ The fact that $B$ is a diagonal matrix isn't really all that interesting, except that it permits us to make substitutions in two of the terms $$\eqalign{ \Eij B &= \Eij B_{jj} \\ B\Eij &= B_{ii} \Eij \\ }$$