Derivative of matrix $(A^TA)^{-1/2}D(A^TA)^{-1/2}$ w.r.t. $A_{ij}$

125 Views Asked by At

I want to find derivative of matrix $(A^TA)^{-1/2}D(A^TA)^{-1/2}$ w.r.t. $A_{ij}$ where D is a diagonal matrix. Alternatively, it is okay too to have

$$\frac{\partial}{\partial A_{ij}} a^T(A^TA)^{-1/2}D(A^TA)^{-1/2}b$$

Is there any reference for such problem? I have the matrix cookbook which gives results when $D=I$. But how is this general form evaluating to?

1

There are 1 best solutions below

0
On

$\def\o{{\tt1}}\def\n{{\vec\nu}}\def\m{{\vec\mu}}$Define the matrix variable $$\eqalign{ B &= (A^TA)^{1/2} \;=\; B^T \\ }$$ and calculate its differential and vectorize it $$\eqalign{ B^2 &= A^TA \\ dB\,B+B\,dB &= dA^TA+A^TdA \\ \big(B\otimes I+I\otimes B\big)\,db &= \Big((A^T\otimes I)K+(I\otimes A^T)\Big)\,da \\ M\,db &= N\,da \\ }$$ where $K$ is the commutation matrix associated with Kronecker products.

Write the function as $$\eqalign{ F &= (A^TA)^{-1/2}D(A^TA)^{-1/2} \quad\implies\quad BFB = D \\ }$$ then calculate its vectorized differential and gradient $$\eqalign{ B\,dF\,B &= -\big(dB\;FB + BF\;dB\big) \\ \big(B\otimes B\big)\,df &= -\big(BF^T\otimes I + I\otimes BF\big)\,db \\ P\,df &= -Q\,db \\ &= -QM^{-1}N\,da \\ df &= -P^{-1}QM^{-1}N\,da \\ \frac{\partial f}{\partial a} &= -P^{-1}QM^{-1}N \\ }$$ This is not quite the quantity that was requested, but hopefully it's still useful.


The desired form of the gradient can be obtained by defining the tensors $$\eqalign{ \n_{\ell jk} &= \begin{cases} \o\quad{\rm if}\;\;\ell=j+km-m \\ 0\quad{\rm otherwise} \\ \end{cases} \\ \m_{jk\ell} &= \n_{\ell jk} \\ }$$ with index ranges $$\eqalign{ \o&\le\; j \;\le m, \qquad \o&\le\; k \;\le n, \qquad \o&\le\; \ell \;\le mn \\ }$$ Using these, a matrix $V\in{\mathbb R}^{m\times n}$ can be converted to its vectorized form $v\in{\mathbb R}^{mn}$ $$\eqalign{ v &= \n:V \;=\; V:\m \;=\; {\rm vec}(V) \\ V &= \m\cdot v \;=\; v\cdot\n \;=\; {\rm unvec}(v) \\ }$$ Applying these to the vectorized gradient recovers the full fourth-order gradient $$\eqalign{ \frac{\partial F}{\partial A} &= \m\cdot\left(\frac{\partial f}{\partial a}\right)\cdot\n \\ }$$ Contracting with the single-entry matrix $E_{ij}=e_ie_j^T$ yields $$\eqalign{ A_{ij} &= A:E_{ij} \quad\implies\quad \frac{\partial F}{\partial A_{ij}} &= \left(\frac{\partial F}{\partial A}\right):E_{ij} \\ }$$ which is the requested gradient.


If you're doing this on a computer rather than the tensor approach sketched above, just use the reshape() function built-in to most languages. For example in Julia one would write

# recover full gradient from vectorized gradient 
dF_dA = reshape(df_da, size(F)..., size(A)...)

# access the (i,j)th matrix component of this gradient
M = dF_dA[:,:,i,j]