I’m asked to prove $\frac{d}{dt}\Big|_{t=0}\mbox{tr}(e^{X+tY})=\mbox{tr}(e^XY)$ for any $X,Y$ in $M_n(\mathbb{C})$. My attempt is to assume both $X$ and $Y$ are diagonalizable, and since the set of all diagonalizable matrices is dense in $M_n(\mathbb{C})$, if we can show this is true for diagonalizable matrices, then we are done. I expected this will somehow simplify the proof, but seems it does not work well unless I further assume $X,Y$ can be diagonalizable at the same time. Any suggestions on this?
Prove $\frac{d}{dt} \Big|_{t=0}\mbox{tr}(e^{X+tY})=\mbox{tr}(e^XY)$
266 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 4 best solutions below
On
We may assume $X \neq 0$. Put $f_m(t)= \frac{1}{m!}\mbox{tr}(X+tY)^m$. Then $\mbox{tr}(e^{X+tY})=\sum_{m=0}^{\infty} f_m(t)$. We will interchange the differentiation and the infinite sum.
Let's find an upper bound for $|f_m'(t)|$ using the Frobenius norm $\lVert \cdot \rVert$. Recall that $|\mbox{tr}(A)| \leq \sqrt{n} \cdot \lVert A \rVert$ hold for any $A \in M_n(\mathbb C)$. Pick $a>0$ and suppose $t \in [-a, a]\setminus \{0\}$. Then
\begin{align*} |f_m'(t)| &\leq \frac{\sqrt{n}}{m!}\sum_{r=1}^{m} \binom{m}{r}r|t|^{r-1} \lVert X\rVert^{m-r} \lVert Y\rVert^r \\ &\leq \frac{\sqrt{n}}{m!}\sum_{r=1}^{m} \binom{m}{r}ra^{r-1} \lVert X\rVert^{m-r} \lVert Y\rVert^r \\ &=\sqrt{n}\cdot\frac{\lVert X \rVert^m}{m!}\sum_{r=1}^{m} \binom{m}{r}ra^{r-1} \lVert X\rVert^{-r} \lVert Y\rVert^r \\&= \sqrt{n}\cdot\frac{\lVert X \rVert^m}{m!} \frac{d}{da} (1+a\lVert X \rVert^{-1} \lVert Y \rVert )^m \\&= \frac{\sqrt{n} \cdot \lVert Y \rVert}{(\lVert X \rVert +a\lVert Y \rVert )} \cdot\frac{1}{m!} (\lVert X \rVert+a \lVert Y \rVert )^{m} \\ &:=M_m\end{align*}
Observe that \begin{align*}\sum_{m=0}^{\infty} M_m = \frac{\sqrt{n} \cdot \lVert Y \rVert}{(\lVert X \rVert +a\lVert Y \rVert )} \cdot\exp(\lVert X \rVert+a \lVert Y \rVert) < \infty \end{align*}
By the Weierstrass M-test, $\sum_{m=0}^{\infty}f_m'$ converges uniformly in $[-a, a] \setminus \{ 0\}$. If $t=0$, then \begin{align} &f_m'(0)=\frac{\mbox{tr}(X^{m-1}Y)}{(m-1)!} \tag{if $m \geq 1$} \\ &f_0'(0) = 0 \end{align}
whence $\sum_{m=0}^{\infty} f_m'(0)= \mbox{tr}\left(e^X Y \right)$. To sum up, $\sum_{m=0}^{\infty}f_m'$ converges uniformly on $[-a, a]$. It follows that $f(t):=\mbox{tr}(e^{X+tY})=\sum_{m=0}^{\infty}f_m(t)$ satisfies $f'(t) = \sum_{m=0}^{\infty} f_m'(t)$ for all $t \in [-a, a]$. In particular, $f'(0)=\sum_{m=0}^{\infty}f_m'(0) = \mbox{tr}(e^X Y)$.
On
For ease of typing, define $$\eqalign{ A&= A(t) = X+tY \quad\implies\quad &dA = Y\,dt \\ &&X = \lim_{t\to 0}\,A(t) \\ }$$ The Frobenius product notation for the trace will also prove convenient $$A:B = {\rm Tr}(A^TB) = B:A$$ First, note that the Taylor series is valid for the matrix exponential $$\eqalign{ e^A = \sum_{k=0}^\infty \frac{A^k}{k!} \quad\implies\quad de^A &= \sum_{k=0}^\infty\frac{1}{k!}\sum_{j=0}^{k-1} A^j\,dA\,A^{k-1-j} \\ }$$ Next, write the objective function and calculate its differential, gradient and derivative. $$\eqalign{ \phi &= {\rm Tr}(Ie^A) \\ &= I:e^A \\ d\phi &= I:de^A \\ &= I:\left[\sum_{k=0}^\infty\frac{1}{k!}\sum_{j=0}^{k-1} A^j\,dA\,A^{k-1-j}\right] \\ &= \left[\sum_{k=0}^\infty\frac{1}{k!}\sum_{j=0}^{k-1} A^j\,I\,A^{k-1-j}\right]^T:dA \\ &= \left[\sum_{k=0}^\infty\frac{1}{k!}\sum_{j=0}^{k-1} A^{k-1}\right]^T:dA \\ &= \left[\sum_{k=0}^\infty\frac{A^k}{k!}\right]^T:dA \qquad\implies \frac{\partial\phi}{\partial A} = \left[e^A\right]^T \\ &= \left[e^A\right]^T:Y\,dt \\ \frac{d\phi}{dt} &= {\rm Tr}(e^AY) \\ }$$ Finally, take the limit as $t\to 0$. $$\eqalign{ \lim_{t\to 0}\left(\frac{d\phi}{dt}\right) &= \lim_{t\to 0}\,{\rm Tr}\left(e^AY\right) \\ &= {\rm Tr}\left(e^XY\right) \\ }$$
The rearrangement property of the Frobenius product was used in some of the steps above, e.g. $$\eqalign{ A:BCD &= B^TA:CD \\ &= AD^T:BC \\ &= B^TAD^T:C \\ }$$ these follow directly from the cyclic property of the trace $$ {\rm Tr}(ABCD) = {\rm Tr}(BCDA) = {\rm Tr}(CDAB) = etc. $$
UPDATE
The above is an example of a formula (which can be found on page 12 of the Matrix Cookbook) which is valid for any analytic function $\;f(z)$ $$\eqalign{ \frac{\partial\,{\rm Tr}(f(A))}{\partial A} &= f^\prime(A)^T \\ }$$ Let $f(z)=e^z$ then $f'(z)=e^z\;$ and the whole derivation reduces to $$\eqalign{ \phi &= {\rm Tr}(e^A) \\ d\phi &= (e^A)^T:dA = {\rm Tr}(e^AY)\,dt \\ \frac{d\phi}{dt} &= {\rm Tr}(e^AY) \\ \lim_{t\to 0}\left(\frac{d\phi}{dt}\right) &= {\rm Tr}(e^XY) \\ }$$
On
Truly, definitely, only a side-comment, irrelevant to anybody's immediate needs, but a comment worth making - I think! maybe? I hope??? - and one truly, definitely, too much of a pain to enter in the comment section.
From the Lie theoretic point of view, $${d\over dt}\Bigg\vert_{t=0}\exp (X+tY)= d\exp_X(Y),$$ where $d\exp_X$ denotes the derivative of the (Lie theoretic!) exponential function $$\exp\colon {\mathfrak g} \to G,$$ at $X\in {\mathfrak g} = M_n({\mathbb C})$, with $G = GL_n({\mathbb C})$, the invertible matrices, and $Y\in T_X{\mathfrak g}$.
It turns out that, suitably interpreted, $$ d\exp_X (Y)= g(\text{ad }X)Y \cdot \exp X, $$ where ad is the adjoint operator, and $$g(z) = {e^z -1 \over z }.$$ ("Suitably interpreted": for instance, above, the multiplication on the right by $\exp X$ on the tangent space is the 'functorial' operation arising from multiplication on the right on the Lie group.)
All of this turns up - for instance - in the (or 'a') proof of the Hausdorff-Campbell-Baker formula. For a non-power series proof (sort of), albeit one arguably a bit heavy on notation (but certainly more explicit than mine), see 4.26, and the following few sections (this answer/comment is basically lemma 4.27) of Natural Operations in Differential Geometry (Kolar, Michor, Slovak)
Taking traces 'kills' ad, and one ends up with your formula, almost, except that $\exp X$ is on the right - but, that doesn't matter. (I did it that way in keeping with this reference.)
The trace is just the sum of diagonal values, so you can put the derivative inside the trace: $$\frac{d}{dt}|_{t=0}tr(e^{X+tY})=tr(\frac{d}{dt}|_{t=0}e^{X+tY})=tr(e^{X+tY}Y)|_{t=0}=tr(e^XY)$$