Differentiation of tr$(AD^{-3\alpha}B)$ with respect to scalar $\alpha$, $D$ is positive diagonal.

79 Views Asked by At

Please, may I know how to differentiate tr$(AD^{-3\alpha}B)$ with respect to scalar $\alpha$. I'm thinking of an approach with the Frobenius inner product, but I'm not conversant with the rules. I'm open to any convenient approach. Thanks.

Edit: Let $D$ be positive diagonal so we can have negative powers of $D$.

2

There are 2 best solutions below

1
On BEST ANSWER

Define the variables $$\eqalign{ L &= \log(D),\quad \beta = -3\alpha \cr }$$ One nice thing about diagonal matrices is that they can be manipulated almost like scalars. Consider the following diagonal matrix function and its differential. $$\eqalign{ F &= D^\beta = \exp(\beta L) \cr dF &= FL\,d\beta = -3FL\,d\alpha \cr }$$ Write the trace function in terms of the Frobenius (:) product and the $F$-function.
Then find its differential and gradient. $$\eqalign{ \phi &= {\rm Tr}(BAF) = BA:F \cr d\phi &= BA:dF = -3BA:FL\,d\alpha \cr \frac{d\phi}{d\alpha} &= -3BA:FL \cr }$$ NB:   You stated that the $D$-matrix was non-negative, but that's not good enough. For the logarithm (or negative powers) to make sense, there can be no zeros on its diagonal.

0
On

To answer this question, all that is needed is a systematic application of the chain rule in multiple dimensions. Hopefully you are familiar with differentiation as explained in Spivak's Calculus on Manifolds or Loomis and Sternberg's Advanced Calculus. The notation I'll use is very close to both these books. So for example,

If $V$ and $W$ are (finite-dimensional) normed vector spaces, and $f:V \to W$ is differentiable at a point $a \in V$, (refer to either of those books for a precise definition) then $Df_a$ shall denote the derivative of $f$ evaluated at the point $a$. This is a linear map from $V$ to $W$. So $Df_a(h)$ means the derivative of $f$ at $a$ evaluated at the point $h$.

Then you need to know the chain rule: \begin{equation} D(f \circ g)_a = Df_{g(a)} \circ Dg_a, \end{equation} and also that if $T: V \to W$ is a linear map then for any $a \in V$, $DT_a = T$. With all of this preliminary stuff out of the way, to solve your question define the following maps:

  • $F: \mathbb{R} \to \mathbb{R}$, defined by $F(\alpha) = \text{tr}(AD^{-3\alpha}B)$

  • The "left-multiplication by $A$" map defined by $L_A(X) = AX$, where $X$ is a matrix of appropriate size.

  • The "right-multiplication by $B$" map defined by $R_B(X) = XB$, where $X$ is a matrix of appropriate size.

  • $f: \mathbb{R} \to M_{n \times n}(\mathbb{R})$ (if $D$ is $n \times n$), defined by $f(\alpha) = D^{-3\alpha}$.

You wish to calculate $F'(\alpha)$, so we first begin by writing $F$ as a composition of $4$ functions: \begin{align} F(\alpha) = (\text{tr} \circ L_A \circ R_B \circ f)(\alpha). \end{align} So, by a direct application of the Chain rule, (and it's corollary; see Chapter 3, Theorem 7.2 in Loomis and Sternberg), we can calculate $F'(\alpha)$ by the formula: \begin{equation} F'(\alpha) = \left[D(\text{tr})_{(L_A\circ R_B \circ f)(\alpha)} \circ D(L_A)_{(R_B \circ f)(\alpha)} \circ D(R_B)_{f(\alpha)} \right] \left( f'(\alpha) \right) \end{equation} i.e $F'(\alpha)$ is the composition of those $3$ linear maps applied to $f'(\alpha)$. Notice that $\text{tr}, L_A, R_B$ are linear maps, so they are their own derivatives. Hence, \begin{align} F'(\alpha) &= \left[\text{tr} \circ L_A \circ R_B \right] \left( f'(\alpha) \right) \\ &= \text{tr} \left( A \cdot f'(\alpha) \cdot B \right) \end{align} To compute $f'(\alpha)$, you just differentiate each entry of $D^{-3\alpha}$ with respect to $\alpha$. So if $D = \text{diag}(x_1, \dots, x_n)$ then \begin{align} f'(\alpha) &= \text{diag}(-3 \text{log}(x_1) x_1^{-3\alpha}, \dots, -3 \text{log}(x_n) x_n^{-3\alpha}) \\ &= -3 \text{log}(D) \cdot D^{-3\alpha} \end{align}

So, finally,we get

\begin{equation} F'(\alpha) = -3 \text{tr} \left[A \cdot \text{log}(D) \cdot D^{-3\alpha} \cdot B \right] \end{equation}