Partial derivative of the trace of matrix entry-wise exponential?

666 Views Asked by At

Just checking my math here and getting some help for the exponential part.

$\renewcommand{\v}[1]{\mathrm{vec}\left(#1\right)} \renewcommand{\m}[1]{\mathbf{#1}} \renewcommand{\trace}[1]{\mathrm{trace}\left(#1\right)} \renewcommand{\diag}[1]{\mathrm{diag}\left(#1\right)}$ Given that $\m X$ is a $n \times m$ matrix, what does the following partial derivative equal to ? $$\frac{\partial \trace{\exp(\mathbf X\mathbf X^\top)}}{\partial\m X}$$

We have that $\trace{\mathbf F(\mathbf X)} = \v{\mathbf F(\mathbf X)}^\top \v{\mathbf I}$, so its derivative is $\v{\partial \m F(\m X)}^\top\v{\m I}$.

We also have $\partial \exp f(x) / \partial x = \exp f(x)\partial f(x)/\partial x$.

I guess the partial derivative $\partial \exp \m X \m X^\top / \partial \m X$ must be something like $$\diag{\v{\exp \m X \m X^\top}}(\m I \otimes \m X+\m X \boxtimes \m I)$$

So in the end, the derivative I'm looking for must be equal to $2\m X^\top\exp(\m X\m X^\top)$ ?

2

There are 2 best solutions below

12
On BEST ANSWER

The final expression is correct in the sense that if $G:X\mapsto\mathrm{tr}(\exp(XX^T))$, then $$\langle\nabla G(X),Z\rangle=2\mathrm{tr}(ZX^T\exp(XX^T))=2\mathrm{tr}(\exp(XX^T)XZ^T).$$ Thus, if one identifies the gradient $\nabla G(X)$ with the matrix $M_X$ such that $$\langle\nabla G(X),Z\rangle=\mathrm{tr}(ZM_X),$$ then indeed, the gradient is $$2X^T\exp(XX^T)=2\exp(X^TX)X^T.$$ A more logical convention would be to identify the gradient $\nabla G(X)$ with the matrix $M_X$ such that $$\langle\nabla G(X),Z\rangle=\mathrm{tr}(ZM_X^T)=\mathrm{tr}(MZ_X^T),$$ then the gradient would rather be $$2\exp(XX^T)X=2X\exp(X^TX).$$ The intermediate steps proposed in the question are opaque to me. A direct proof of the result uses simply the characterization of the differential $\nabla G(X)$ at $X$ as the unique linear functional on the space of matrices of size $n\times m$ such that, for every such matrix $Z$ and real number $h$, when $h\to0$, $$G(X+hZ)=G(X)+\langle\nabla G(X),Z\rangle h+o(h).$$

0
On

The differential of the trace of a scalar function ($f$) with derivative ($f'$) applied to a square matrix argument ($Y$) is $$ \eqalign { d\,{\rm tr}(f(Y)) &= f'(Y^T):dY \cr } $$ The exponential function is its own derivative, so $f'=f$ and $$ \eqalign { d\,{\rm tr}({\rm exp}(Y)) &= {\rm exp}(Y^T):dY \cr } $$ In the case that $Y=(XX^T)$, we have $Y=Y^T$ and $$ \eqalign { d\,{\rm tr}({\rm exp}(Y)) &= {\rm exp}(Y):(dX\,X^T + X\,dX^T) \cr &= {\rm exp}(Y):2\,{\rm sym}(dX\,X^T) \cr &= 2\,{\rm sym}({\rm exp}(Y)):dX\,X^T \cr &= 2\,{\rm exp}(Y)\,X:dX \cr &= 2\,{\rm exp}(XX^T)\,X:dX \cr } $$ The derivative is $$ \eqalign { \frac {\partial\,{\rm tr}({\rm exp}(Y))} {\partial X} &= 2\,{\rm exp}(XX^T)\,X \cr &= 2\,X\,{\rm exp}(X^TX)\ \cr } $$ Where the final equality follows from this general property of matrix functions $$ \eqalign { f(AB^T)\,A &= A\,f(B^TA) \cr } $$ where $A,B \in {\mathbb R}^{m\times n}$