Find the derivative of a diagonal matrix and norm

134 Views Asked by At

Find the derivative with respect to $X \in \mathbf{R}^{n \times p} $ of $$ \Phi(X) = \operatorname{Tr} \left( X^{\top} H(X) X \right) $$ where $H(X) := D(X) A D(X)$, where $A$ is symmetric and $$D(X) = \operatorname{diag} \left( (X_{1,:}X_{1,:}^T)^{-\frac12}, \cdots, X_{n,:}X_{n,:}^T)^{-\frac12} \right)$$ where $X_{1,:}$ are the rows of $X$.


It's observed that $D(X)$ are made from inverse of norm 2 of diagonal matrix $X X^T$. If $H(X)$ doesn't depend on $X$, I know the derivative, but it's not interesting in our case. In

$\qquad\qquad\qquad$ Gradient of $Z \mapsto \left( Z^T \left( \operatorname{diag}(ZZ^T\mathbf{1}) - ZZ^T\right)Z\right)$

I have found a similar case where two elegant solutions were proposed, one includes using $\delta X= e_i e_j^T$ and the other used the Hadamard product properties( which I have no knowledge about it), but in the diagonal, they use a vector instead. While I have the diagonal of a matrix, which is like the $diag^*$ proposed in the first solution.

2

There are 2 best solutions below

3
On

Here is a partial answer, I hope it helps you. Or perhaps someone can suggest how to complete it. I try to describe $\nabla \Phi (X)$, which is the matrix characterized by the equality $$\dfrac{\partial \Phi}{\partial V} (X) = \nabla \Phi (X) : V.$$ We compute $$\dfrac{\partial \Phi}{\partial V} (X) := \dfrac{\mathrm{d}}{\mathrm{d} t} \Phi(X + tV) \bigg|_{t=0}$$ as follows \begin{equation*} \begin{split} \dfrac{\mathrm{d}}{\mathrm{d} t} & \operatorname{Tr} \Big[ (X + tV)^{\operatorname{T}} H(X+tV) (X + tV) \Big] \bigg|_{t=0} \\ & = \operatorname{Tr} \Big[ V^{\operatorname{T}} H(X) X + X^{\operatorname{T}} \Big( \tfrac{\mathrm{d}}{\mathrm{d} t} H(X+tV) \big|_{t=0} \Big) X + X^{\operatorname{T}} H(X) V \Big] \\ & = \operatorname{Tr} \Big[ V^{\operatorname{T}} H(X) X + X^{\operatorname{T}} H(X) V \Big] + \operatorname{Tr} \Big[ X^{\operatorname{T}} \Big( \tfrac{\mathrm{d}}{\mathrm{d} t} H(X+tV) \big|_{t=0} \Big) X \Big] \\ & = 2 X^{\operatorname{T}} H(X) : V + \operatorname{Tr} \Big[ X^{\operatorname{T}} \, \tfrac{\partial H}{\partial V}(X) \, X \Big], \end{split} \end{equation*} where we have used that the trace is linear and that $H(X)$ is symmetric (since it is a product of symmetric matrices).

Once we can find a matrix $M(X)$ for which $$\operatorname{Tr} \Big[ X^{\operatorname{T}} \, \tfrac{\partial H}{\partial V}(X) \, X \Big] = \operatorname{Tr} \big[ M(X)V \big] = M(X)^{\operatorname{T}} : V,$$ we are done and we can conclude that $$\nabla \Phi (X) = 2 X^{\operatorname{T}} H(X) + M(X)^{\operatorname{T}}.$$

Before that, it remains to understand the term $$\dfrac{\partial H}{\partial V}(X) = \dfrac{\mathrm{d}}{\mathrm{d} t} H(X+tV) \bigg|_{t=0}$$ and plug it into our expression.

To compute $\partial H/\partial V$ we first analyse $\partial D/\partial V$. Denote by $|X_{i}| = \sqrt{X_{i}X_{i}^{\operatorname{T}}}$ the Euclidean norm of the row $X_{i}$. Then \begin{equation*} \begin{split} \dfrac{\partial D}{\partial V}(X) = \dfrac{\mathrm{d}}{\mathrm{d} t} D(X+tV) \bigg|_{t=0} & = \dfrac{\mathrm{d}}{\mathrm{d} t} \left. \begin{bmatrix} |X_{1} + tV_{1}|^{-1} & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & |X_{n} + tV_{n}|^{-1} \end{bmatrix}\right|_{t=0} \\ & = \begin{bmatrix} -|X_{1}|^{-3} X_{1}V_{1}^{\operatorname{T}} & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & -|X_{n}|^{-3} X_{n}V_{n}^{\operatorname{T}} \end{bmatrix} \\ & = (Y V^{T}) \odot I, \end{split} \end{equation*} where $Y$ is the matrix with $i$-th row equal to $-|X_{i}|^{-3} X_{i}$. Hence, the derivative of $H$ is obtained by the product rule as \begin{equation*} \begin{split} \dfrac{\partial H}{\partial V}(X) & = \dfrac{\partial D}{\partial V}(X) A D(X) + D(X)A \dfrac{\partial D}{\partial V}(X) \\ & = \big[ (Y V^{T}) \odot I \big] A D(X) + D(X)A \big[(Y V^{T}) \odot I\big]. \end{split} \end{equation*}

We just plug this in our expression for the directional derivative to obtain $\partial \Phi/\partial V$, but the problem remains to rewrite our expression in order to obtain the matrix $M(X)$ and consequently $\nabla \Phi(X)$. I am not sure the Hadamard product is the right way to go here, but I tried to use it since you mentioned; maybe some of its properties can be used to express $(Y V^{T}) \odot I$ differently, I don't know.

Explicitly, it remains to find $M(X)$ such that $$\operatorname{Tr} \Big[ X^{\operatorname{T}} \, \Big( \big[(Y V^{T}) \odot I \big] A D(X) + D(X)A \big[(Y V^{T}) \odot I\big] \Big) \, X \Big] = \operatorname{Tr} \big[ M(X)V \big].$$

Or, not sure if it helps, the LHS is the same as $$2 \cdot \operatorname{Tr} \Big[ X^{\operatorname{T}} \, \big[(Y V^{T}) \odot I \big]\, A \, D(X) \, X \Big].$$

5
On

$ \def\BR#1{\Big(#1\Big)} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vecc#1{\op{vec}\LR{#1}} \def\sym#1{\op{sym}\LR{#1}} \def\diag#1{\op{diag}\LR{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $For typing convenience, define the matrix variables $$\eqalign{ Y &= {XX^T} &\qiq dY = \LR{dX\;X^T+X\;dX^T} \equiv 2\,\sym{dX\:X^T} \\ B &= I\odot Y &\qiq dB = 2I\odot\sym{dX\:X^T} \;=\; 2I\odot\LR{dX\:X^T} \\ D &= B^{-1/2} &\qiq dD = -\tfrac12 D^3\:dB \;=\; -D^3\odot\LR{dX\:X^T} \\ H &= DAD &\qiq dH = {dD\;AD + DA\;dD} \\ }$$ and the Frobenius product $(:)$ which is a concise notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \frob{A}^2 \qquad \{ {\rm Frobenius\;norm} \} \\ A:B &= B:A \;=\; B^T:A^T \\ \LR{C\odot A}:B &= C:\LR{A\odot B} \\ \LR{AB}:C &= A:\LR{CB^T} \;=\; B:\LR{A^TC} \\ }$$ Use the above notation to write the function, then calculate its differential and gradient $$\eqalign{ \Phi &= H:Y \\ d\Phi &= H:dY + Y:dH \\ &= H:2\,\sym{dX\:X^T} + {Y}:\LR{dD\:AD+DA\:dD} \\ &= 2HX:dX + \LR{YDA+ADY}:dD \\ &= 2HX:dX - \LR{YDA+ADY}:D^3\odot\LR{dX\:X^T} \\ &= 2HX:dX - D^3\odot\LR{YDA+ADY}:\LR{dX\:X^T} \\ &= 2HX:dX - 2D^3\odot{ADY}:\LR{dX\:X^T} \\ &= 2HX:dX - 2\LR{D^3\odot{ADY}}X:dX \\ &= 2\,\BR{H-D^3\odot{ADY}}X:dX \\ \grad{\Phi}{X} &= 2\LR{H-D^3\odot{ADXX^T}}X \\ }$$ My initial calculation neglected the $A$ matrix.