For matrices $C, X$, and $B$, I know that $\frac{\partial}{\partial X}||CXB||_F^2 = 2C^TCXBB^T$, and that $\partial X^{-1} = -X^{-1}(\partial X) X^{-1}$. However, I am unable to combine these results to calculate $\frac{\partial}{\partial X}||CX^{-1}B||_F^2$.
Calculating the matrix derivative $\frac{\partial}{\partial X}||CX^{-1}B||_F^2$
128 Views Asked by user17762 https://math.techqa.club/user/user17762/detail AtThere are 3 best solutions below
On
The function $X \to \|CX^{-1}B\|_F^2$ is a composition of the functions $g(X)= X^{-1}$ followed by $f(Y) = \|CYB\|_F^2$.
The crux of the multidimensional derivative is the following : for $f : \mathbb R^n \to \mathbb R^m$ differentiable everywhere, the derivative $f'$ is an association such that for any $y \in \mathbb R^n$, $f'(y) : \mathbb R^n \to \mathbb R^m$ is a linear transformation (hence a matrix), which is given by $$ f'(y)v = \lim_{t \to 0}\frac{f(y+tv) - f(y)}{t} $$
Therefore, noting that $M_{n \times n}$ may also be equated to a space $\mathbb R^{n^2}$ in our explanation above, we have $f : M_{n \times n} \to \mathbb R$, hence $f'(Y)$ for any $Y \in M_{n \times n}$ is a linear transformation from $M_{n \times n} \to \mathbb R$. According to what you have derived, we have : $$ [f'(Y)]M = 2C^TCYBB^TM $$
Similarly, $g(X) = X^{-1}$ is a function from $M_{n \times n} \to M_{n \times n}$, so $g'(Y)$ for any $Y \in M_{n \times n}$ is a linear transformation from $M_{n \times n} \to M_{n \times n}$ given by : $$ [g'(Y)]N = -Y^{-1}NY^{-1} $$
The chain rule tells you that the derivative of $f \circ g$ is : $$ [f \circ g]'(X) = f'(g(X)) \cdot g'(X) $$
where $\cdot$ indicates matrix multiplication (or composition of the linear transformations which are the derivatives).
In particular, for any $M$ we have : $$ [[f \circ g]'(X)]M = f'(g(X)) [[g'(X)]M] $$
Note that if $f \circ g$ is well defined then this matrix multiplication will also go through without a dimension problem.
In our case, we have $[g'(X)]M = -X^{-1}MX^{-1}$, and finally, we get that $$ [[f \circ g]'(X)](M) = f'(g(X))[g'(X)M] = -2C^TCX^{-1}BB^TX^{-1}M X^{-1} $$
is how the derivative acts as a linear map at each $M \in M_{n \times n}$.
On
For ease of typing, let's define $$ F = CX^{-1}B$$
Its differential is given by: $$dF = (dC)X^{-1}B + C(dX^{-1})B + CX^{-1}(dB)$$
We have $dA=0$, $dB=0$, and we can calculate $dX^{-1}$ as follows :
\begin{equation} \begin{split} X^{-1}X & = I \\ \implies dX^{-1}X + X^{-1}dX & = dI = 0 \\ \implies dX^{-1}X & = - X^{-1}dX \\ \implies dX^{-1} & = - X^{-1}(dX)X^{-1} \\ \end{split} \end{equation}
Back to our expression, we have,
\begin{equation}
\begin{split}
Y &= ||F||_F^2 \\
& = \text{Tr}(F^TF) = F:F \\
\implies dY & = dF:F + F:dF \\
& = F:dF + F:dF \\
& = 2F:dF \\
& = 2CX^{-1}B:C(dX^{-1})B \\
& = 2CX^{-1}B:-CX^{-1}(dX)X^{-1}B \\
& = -2(CX^{-1})^TCX^{-1}B:(dX)X^{-1}B \\
& = -2(CX^{-1})^TCX^{-1}B(X^{-1}B)^T:dX \\
\end{split}
\end{equation}
Finally, we get: \begin{equation} \begin{split} \frac{\partial (CX^{-1}B)}{\partial X} &= -2(CX^{-1})^TCX^{-1}B(X^{-1}B)^T\\ &= -2(X^{-1})^TC^TCX^{-1}BB^T(X^{-1})^T \end{split} \end{equation}
Define a new matrix variable $$\eqalign{ Y &= X^{-1} \qquad\implies\qquad dY &= -Y\,dX\,Y \\ }$$ Use the gradient, which you already know, to write the differential of the function in terms of $Y$ $$\eqalign{ \phi &= \|CYB\|^2 \\ d\phi &= 2\Big(C^TCYBB^T\Big):dY \\ }$$ Then perform a change of variables from $Y\to X$ $$\eqalign{ d\phi &= 2\Big(C^TCYBB^T\Big):\Big(-Y\,dX\,Y\Big) \\ &= -2\Big(Y^TC^TCYBB^TY^T\Big):dX \\ &= -2\Big((X^{-1})^TC^TCX^{-1}BB^T(X^{-1})^T\Big):dX \\ \frac{\partial \phi}{\partial X} &= -2(X^{-1})^TC^TCX^{-1}BB^T(X^{-1})^T \\ }$$
In the above, a colon is used as a product notation for the trace, i.e. $$\eqalign{A:B = {\rm Tr}(A^TB) = {\rm Tr}(B^TA) = B:A}$$ The terms in such a product can be rearranged in a number of equivalent ways, e.g. $$\eqalign{ A:B &= A^T:B^T \\ A:BC &= B^TA:C = AC^T:B \\ }$$ due to the properties of the trace function.
As you have discovered, the chain rule is difficult to apply in Matrix Calculus. It often requires the calculation of intermediate quantities which are third and fourth order tensors.
The beauty of the differential approach is that the differential of a matrix acts like a matrix. In particular, it obeys all of the rules of matrix algebra.