Calculate the second-order derivative.

210 Views Asked by At

Let $x \in \mathbb{R}^m$ be our variable. I would like to know what is:

$$ \frac{\partial^2 \text{Tr}\big((A+B^\text{T}\textbf{diag}(x)B)^{-1}\big)}{\partial x_i \partial x_j}. $$ $A \in \mathbb{R}^{n\times n}$, $B \in \mathbb{R}^{m\times n} $ and $(A+B^\text{T}\textbf{diag}(x)B)$ is invertible.

The $\textbf{diag}$ operator on a matrix results in a vector of diagonal elements and $\textbf{diag}$ on a vector results in a matrix with diagonal elements on the main diagonal and zero elsewhere.

Hint: via matrixcookbook, I figured out that: $$ \frac{\partial \text{Tr}\big((A+B^\text{T}\textbf{diag}(x)B)^{-1}\big)}{\partial x} = \textbf{diag}(B(A+B^\text{T}\textbf{diag}(x)B)^{-2}B^{\text{T}}) $$

2

There are 2 best solutions below

1
On BEST ANSWER

Let $$\eqalign{ X &= {\rm Diag}(x) \cr U &= A + B^TXB \cr }$$ Write the function in terms of these variables and find its gradient $$\eqalign{ f &= {\rm tr}(U^{-1}) \cr\cr df &= -U^{-2}:dU \cr &= -U^{-2}:(B^TdXB) \cr &= -BU^{-2}B^T:dX \cr &= -BU^{-2}B^T:{\rm Diag}(dx) \cr &= -{\rm diag}(BU^{-2}B^T)^T\,dx \cr\cr g &= \frac{\partial f}{\partial x} = -{\rm diag}(BU^{-2}B^T) \cr }$$ which confirms the result from the Cookbook.

Next, find the gradient of $g$ $$\eqalign{ dg &= {\rm diag}\Big(B\big\{U^{-2}(dU)U^{-1}+U^{-1}(dU)U^{-2}\big\}B^T\Big) \cr &= {\rm diag}\Big(BU^{-2}(B^TdXB)U^{-1}B^T\Big) + {\rm diag}\Big(BU^{-1}(B^TdXB)U^{-2}B^T\Big) \cr\cr }$$ Now we need this Hadamard product trick $${\rm diag}(AXB) = (B^T\circ A)\,x$$ to eliminate the diag() operators $$\eqalign{ dg &= \Big[(BU^{-1}B^T)^T\circ(BU^{-2}B^T) + (BU^{-2}B^T)^T\circ(BU^{-1}B^T)\Big]\,dx \cr\cr H &= \frac{\partial g}{\partial x} = (BU^{-T}B^T)\circ(BU^{-2}B^T) + (BU^{-2T}B^T)\circ(BU^{-1}B^T) \cr\cr }$$ This result is the Hessian of the function which you asked about.

A nice way to summarize these results is $$\eqalign{ F &= BU^{-1}B^T \cr G &= BU^{-2}B^T \cr H &= F^T\circ G + G^T\circ F\cr g &= -{\rm diag}(G) \cr }$$

1
On

Let $l:x\rightarrow U=A+B^Tdiag(x)B, f=U\rightarrow tr(U^{-1}),g=f\circ l:x\rightarrow f(U)$ and $L_i$ be the $i^{th}$ row of $B$.

Since $l$ is an affine function, $Dl:h\in \mathbb{R}^m\rightarrow B^Tdiag(h)B$ and $D^2g_x(h,k)=D^2f_U(Dl(h),Dl(k))$. Moreover$Dl(e_i)=L_i^TL_i$ where $(e_i)_i$ is the canonical basis of $\mathbb{R}^m$. Note that $\dfrac{\partial^2 g}{\partial x_i\partial x_j}(x)=D^2g_x(e_i,e_j)=D^2f_U(Dl(e_i),Dl(e_j))=D^2f_U(H,K)$ where $H=L_i^TL_i,K=L_j^TL_j\in M_{n}$.

Here $Df_U(H)=-tr(U^{-1}HU^{-1})=-tr(U^{-2}H)$ and $D^2f_U(H,K)=tr(U^{-2}KU^{-1}H+U^{-1}KU^{-2}H)$.

Conclusion. $\dfrac{\partial^2 g}{\partial x_i\partial x_j}(x)=D^2g_x(e_i,e_j)=tr(U^{-2}L_j^TL_jU^{-1}L_i^TL_i+U^{-1}L_j^TL_jU^{-2}L_i^TL_i)$.