Matrix derivative of the Frobenius norm of a product containing inverse

118 Views Asked by At

Let $A\in\mathbb{R^{n\times d}}$, $X\in\mathbb{R^{d\times d}}$, $d>n$. Let $A$ have rank $n$ and let $X$ be invertible. What is the derivative of $$\Vert XA^T(AXA^T)^{-1} - A^T(AA^T)^{-1}\Vert_F^2$$ with respect to $X$? Here, $\Vert A \Vert_F^2 = Tr(A^TA)$.

A step that would help with the above problem is whether it is possible to calculate the derivative of $$Tr(U(X)V(X))$$ with respect to X in terms of the derivatives of $Tr(U(X))$ and $Tr(V(X))$ with respect to X. Here U and V are matrix functions of X.

I found the "Scalar-by-matrix" section of https://en.wikipedia.org/wiki/Matrix_calculus useful in similar problems.

2

There are 2 best solutions below

2
On BEST ANSWER

Let $\mathbf{C}= \mathbf{X} \mathbf{A}^T (\mathbf{A}\mathbf{X}\mathbf{A}^T)^{-1} - \mathbf{A}^T (\mathbf{A}\mathbf{A}^T)^{-1}$ and $\mathbf{D} = \mathbf{A}\mathbf{X}\mathbf{A}^T$

Using these notations, so that we can write $\phi = \| \mathbf{C} \|_F^2 = \mathbf{C}:\mathbf{C}$

It follows \begin{eqnarray} d\phi &=& 2 \mathbf{C}:d\mathbf{C} \\ &=& 2 \mathbf{C}:(d\mathbf{X}) \mathbf{A}^T \mathbf{D}^{-1} - 2 \mathbf{C}:\mathbf{X} \mathbf{A}^T \mathbf{D}^{-1}(d\mathbf{D})\mathbf{D}^{-1}\\ &=& 2 \mathbf{C}\mathbf{D}^{-T} \mathbf{A}:d\mathbf{X} - 2 \mathbf{D}^{-T}\mathbf{A}\mathbf{X}^T\mathbf{C} \mathbf{D}^{-T}: \mathbf{A}(d\mathbf{X})\mathbf{A}^T \end{eqnarray} Finally the gradient simplifies into $$ 2 (\mathbf{I} - \mathbf{A}^T \mathbf{D}^{-T}\mathbf{A}\mathbf{X}^T)\mathbf{C} \mathbf{D}^{-T} \mathbf{A} $$

1
On

There are various ways to differentiate with respect to a matrix. The one in the link is the differentiation with respect to all the entries of $X$, which we denote here by $x_{ij}$, $i,j=1,\ldots,d$.

Using the trace formulation, we need to compute the derivative of

$$\mathrm{trace}[(XA^T(AXA^T)^{-1} - A^T(AA^T)^{-1})^T(XA^T(AXA^T)^{-1} - A^T(AA^T)^{-1})].$$

Since the trace is a linear operator, we can see that the only really troublesome term here is the inverse term. Luckily, we can show that

$$\dfrac{\partial}{\partial x_{ij}}(AXA^T)^{-1}=-(AXA^T)^{-1}A\dfrac{\partial X}{\partial x_{ij}}A^T(AXA^T)^{-1}$$

which can be rewritten as

$$\dfrac{\partial}{\partial x_{ij}}(AXA^T)^{-1}=-(AXA^T)^{-1}Ae_ie_j^TA^T(AXA^T)^{-1}$$

where $(e_1,\ldots,e_d)$ is the natural basis for $\mathbb{R}^d$. Now the rest of the derivation is just standard algebra.

If you need instead a solution in terms of the directional derivative, let me know, and I will update my answer. It is getting late here.