Derivative (Jacobian) of a matrix equation

94 Views Asked by At

I have this equation:

$y = e^{t(A + W)} x_0 $

where A is a diagonal matrix and W is a symmetric matrix.

I need to find $\frac{\partial y}{\partial W}$.

If A and W commute then I could use the fact that

$e^{t(A + W)} = e^{tA} . e^{tW} $ and then use the kronecker product:

$\frac{\partial y}{\partial W} = x_0^T \otimes e^{tA} \, \, vec(t e^{tW}) $

But I can't derive the case were they don't commute.

Any thoughts?? thanks

2

There are 2 best solutions below

9
On BEST ANSWER

$ \def\k{\otimes} \def\h{\odot} \def\BR#1{\Big[#1\Big]} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\vc#1{\op{vec}\LR{#1}} \def\qiq{\quad\implies\quad} \def\l{\lambda} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\R{{\large\cal R}} \def\x{x_0} \def\c#1{\color{red}{#1}} \def\J{{\cal J}} $Construct a new symmetric matrix variable and calculate its eigendecomposition $$\eqalign{ S &= \LR{A+W} &\qiq dS=dW \quad \{ \c{\sf\,differential\,} \} \\ S &= QLQ^T&\qiq I=Q^TQ,\;\;L=\Diag{\l_k} \\ }$$ Given a differentiable function $f,\,$ the $\sf Daleckii$ - $\sf Krein\ Theorem$ says $$\eqalign{ F &= f(S) \\ dF &= Q\,\BR{R\h\LR{Q^TdS\,Q}}\,Q^T \\ {R_{jk}} &= \begin{cases} {\large\frac{f(\l_j)-f(\l_k)}{\l_j-\l_k}} \qquad {\rm if}\;\l_j\ne\l_k \\ \\ \quad f'(\l_k) \qquad\quad\; {\rm otherwise} \\ \end{cases} }$$ where $\h$ denotes the Hadamard product.

In the current problem, the function is pretty simple $$ f(\l)=e^{t\l},\qquad\quad f'(\l)=te^{t\l} $$ Use this to calculate the Jacobian $(\J)$ of $y$ $$\eqalign{ \def\J{{\cal J}} y &= F\x \\ dy &= dF\,\x \\ &= Q\BR{R\h\LR{Q^TdS\,Q}}Q^T\x \\ &= \vc{Q\BR{R\h\LR{Q^TdW\,Q}}Q^T\x} \\ &= \LR{\x^TQ\k Q}\cdot\c{\Diag{\vc{R}}}\cdot\vc{Q^TdW\,Q} \\ &= \LR{\x^TQ\k Q}\cdot\c\R\cdot\LR{Q\k Q}^T\vc{dW} \\ \grad{y}{\vc W} &= \LR{\x\k I}^T\LR{Q\k Q}\,\R\;\LR{Q\k Q}^T \;\equiv\; \J \\ }$$ Without vectorization the gradient is a third-order tensor, which cannot be expressed in matrix notation.

Update

Since $y$ is a vector, you cannot optimize against it. So I'll assume that you are optimizing against a cost function such as $\phi = \tfrac12 y^Ty$

Use the preceding results to calculate the gradient $(G)$ of this cost function $$\eqalign{ d\phi &= y^Tdy \\ &= F\x:dy \\ &= F\x:\LR{Q\BR{R\h\LR{Q^TdW\,Q}}Q^T\x} \\ &= \LR{Q^TF\x\x^TQ}:\BR{R\h\LR{Q^TdW\,Q}} \\ &= Q\BR{R\h Q^TF\x\x^TQ}Q^T:dW \\ \grad{\phi}{W} &= Q\BR{R\h Q^TF\x\x^TQ}Q^T \;\equiv\; G \\ }$$ Unlike the Jacobian, $G$ has the same shape as $W$ and is what you'd use for gradient descent $$\eqalign{ \def\o{{\tt1}} G_k &= G(W_k) \\ W_{k+\o} &= W_k - \eta_k\,G_k \\ k &= k+\o \\ }$$ $\sf NB\!:\:$ The Frobenius product $(:)$ was used in the gradient calculation $$A:B = I:\LR{A^TB} = \trace{A^TB}$$ It commutes with the Hadamard product and is indispensable for such calculations.

1
On

The expansion continues by commutators $$e^A e^W = e^{A + W + \sum_n c_n \ e^{ n-\text{fold symmetrized commutators}[A,B]}},$$ the so called Baker-Cambell-Hausdorff formula