Derivative of a function of trace

908 Views Asked by At

Suppose $X$ is a diagonal matrix, $X \in \mathbb{R}^{m \times m}$. Let $f\colon\mathbb{R} \to \mathbb{R}$ be a twice differentiable function. Find the following $$\nabla^2_X f(tr(X))$$ where $\nabla^2_X$ is the Hessian w.r.t $X$.

Attempt: I have no idea how to deal with the case when there is trace involved. I know for a fact that $$\nabla^2_u f(Au) = A^T f''(u)A$$ when $u$ is a vector.

3

There are 3 best solutions below

0
On BEST ANSWER

Let's define $$g:\mathbb{R}^{m\times m}\rightarrow \mathbb{R}, \quad g(X) = f\left(\mathop{\textrm{Tr}}(X)\right) = f\left(\textstyle\sum_i X_{ii}\right).$$ Notice that this function is constant for all off-diagonal elements of $X$, so any partial derivative that involves an off-diagonal element must be zero. So even though I've actually defined $g$ on all $m\times m$ matrices and not just the diagonal matrices, the Hessian still involves just the diagonal.

Now consider a partial derivative with respect to a single diagonal element. A simple application of the chain rule gives us $$\frac{\partial g(X)}{\partial X_{jj}} = f'\left(\sum_i X_{ii}\right) \cdot \frac{\partial}{\partial X_{jj}} \sum_i X_{ii} = f'\left(\sum_i X_{ii}\right)$$ For the second derivative, then, we get: $$\frac{\partial^2 g(X)}{\partial X_{kk}\partial X_{jj}} = \frac{\partial}{\partial X_{kk}} f'\left(\sum_i X_{ii}\right) = \frac{\partial}{\partial X_{kk}} f''\left(\sum_i X_{ii}\right) \cdot \frac{\partial}{\partial X_{kk}} \sum_i X_{ii} = f''\left(\sum_i X_{ii}\right).$$ The second partial derivative is the same for all combinations of the diagonal elements.

So far so good. But here's the problem: what does the Hessian of a matrix function look like? For a function defined on $\mathbb{R}^n$, we generally treat the Hessian as a symmetric matrix. But in this case, we can't do that. A generalizable notion is that the Hessian is a symmetric linear mapping from the input space back onto itself. If we do this right, it fits in nicely into a Taylor expansion: $$g(X+tZ) \approx g(X) + t \langle \nabla g(X), Z \rangle + \tfrac{1}{2} t^2 \langle Z, \nabla^2 g(X) [Z] \rangle$$ In our case, then, we get this: $$\nabla^2 g(X)[Z] = f''(\mathop{\textrm{Tr(X)}}) \cdot \mathop{\textrm{Tr}}(Z) \cdot I$$ where $Z$ is the search direction and $t$ is a scalar.

How do we confirm this? Well, let's look at partial derivatives. For a function $h(x)$ defined on $\mathbb{R}^n$, we have $$\frac{\partial^2 h(x)}{\partial x_i \partial x_j} = \left(\nabla^2 h(x)\right)_{ij} = \langle e_i, \nabla^2 h(x) e_j \rangle$$ where $e_i$, $e_j$ are vectors with ones at positions $i$ and $j$, respectively, and zeros everywhere else. For our matrix function, we have $$\frac{\partial g(X)}{\partial X_{ij} \partial X_{kl}} = \langle E_{ij}, \nabla^2(X) [E_{kl}] \rangle = \left\langle E_{ij}, f''(\mathop{\textrm{Tr(X)}}) \cdot \mathop{\textrm{Tr}}(E_{kl}) \cdot I \right\rangle$$ where $E_{ij}$, $E_{kl}$ are matrices with zeros everyone except for a one in positions $ij$ and $kl$, respectively. Simplifying, $$\begin{aligned} &\left\langle E_{ij}, f''(\mathop{\textrm{Tr(X)}}) \cdot \mathop{\textrm{Tr}}(E_{kl}) \cdot I \right\rangle = f''(\mathop{\textrm{Tr(X)}}) \cdot \mathop{\textrm{Tr}}(E_{kl}) \langle E_{ij}, I \rangle \\&\qquad = f''(\mathop{\textrm{Tr(X)}}) \cdot \mathop{\textrm{Tr}}(E_{kl}) \cdot \mathop{\textrm{Tr}}(E_{ij}) = \begin{cases} f''(\mathop{\textrm{Tr(X)}}) & i=j, k=l \\ 0 & \text{otherwise} \end{cases} \end{aligned}$$ and that's what we were expecting: zero for any partial derivative involving an off-diagonal element, and the same for all others.

5
On

Following up on lythia's comment, if you differentiate $f$ with respect to an off diagonal entry of $f$ you get zero, if you differentiate twice with respect to a diagonal entry you just get $f''$, and so $\nabla^2_Xf=f''I$.

0
On

Given $$\eqalign{ t &= {\rm tr}(X) \cr f &= f(t) \cr }$$ The differentials of these quantities are $$\eqalign{ dt &= I:dX \cr df &= f'\,dt = f'\,I:dX \cr }$$ Since $df =(\frac{\partial f}{\partial X}:dX)$ the gradient is $$\eqalign{ G &= \frac{\partial f}{\partial X} &= I\,f' \cr }$$ You can play this game again with $G$ to get the Hessian $$\eqalign{ dG &= I\,df' \cr &= I\,f''\,dt \cr &= I\,f''\,I:dX \cr &= (I\,I\,f''):dX \cr H &= \frac{\partial G}{\partial X} = I\,I\,f'' \cr }$$ where $II$ is the dyadic product of the identity matrix ($I$) with itself. It is a 4th-order isotropic tensor which is widely used in Continuum Mechanics.

In index notation, this Hessian has a particularly simple form $$\eqalign{ H_{ijkl} &= \delta_{ij}\,\delta_{kl}\,f'' \cr }$$