Derivative of matrix-valued function $W \mapsto W W^T$

121 Views Asked by At

What is the derivative of matrix-valued function $W \mapsto W W^T$? I have checked the entire Internet and can't find a concrete answer.

Assuming that the matrix is orthogonal factor loaded,

$$ \dfrac{\partial \pmb{WW}^T}{\partial \pmb{W}} =2\pmb{W}??? $$

2

There are 2 best solutions below

0
On

We have the mapping $f: \mathbb R^{n \times n} \to \mathbb R^{n \times n}$ given by

$$f(W)=WW^T,$$

Let $|| \cdot||$ be any norm on $\mathbb R^{n \times n}$ (all norms on $\mathbb R^{n \times n}$ are equvalent !).

Let $W_0 \in \mathbb R^{n \times n}.$ $f$ is differentiable in $W_0$ if there is a linear mapping $L: \mathbb R^{n \times n} \to \mathbb R^{n \times n}$ (depending on $W_0$) such that

$$(*) \quad \frac{f(W_0+H)-f(W_0)-L(H)}{||H||} \to 0$$

as $H \to 0.$ In this case $L$ is uniquely determined and is called the derivative of $f$ at $W_0$. In symbols: $f'(W_0)=L.$

Now define $L$ by $L(V):=W_0V^T+VW_0^T$ and confirm that $(*)$ holds.

Hence $f'(W_0)=L.$

0
On

$ \def\a{{\cal E}}\def\b{{\cal F}} \def\g{{\cal G}}\def\d{{\delta}} \def\B{\Big}\def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\BR#1{\B(#1\B)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\iff\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\vecc#1{\operatorname{vec}\LR{#1}} \def\c#1{\color{red}{#1}} $Define the fourth-order isotropic tensors $\{\a,\g\}$ in terms of Kronecker deltas $\{\d\}$ $$\eqalign{ \d_{jk}\d_{\ell m} = \a_{j\ell km} = \g_{j\ell mk}\\ }$$ Then calculate the differential and gradient of your matrix-valued function $$\eqalign{ F &= WW^T \\ dF &= dW\,W^T + W\,dW^T \\ &= \a\,W:dW + W\a:dW^T \\ &= \BR{\a\,W + \LR{W\a}:\g}:dW \\ \grad{F}{W} &= {\a\,W + \LR{W\a}:\g} \\ }$$ As you can see, the gradient is a fourth-order tensor, which is awkward.

Another way to proceed is to note that the matrix self-gradient is given by $$\eqalign{ \grad{W}{W_{k\ell}} = E_{k\ell} \qquad \grad{W_{ij}}{W_{k\ell}} = \a_{ijk\ell} \\ }$$ where $E_{k\ell}$ is the single-entry matrix, all of whose elements equal zero except for the $(k,\ell)$ element, which equals one.

Then direct element-wise differentiation yields $$\eqalign{ F &= WW^T \\ \grad{F}{W_{k\ell}} &= \LR{E_{k\ell}}W^T + W\LR{E_{k\ell}}^T\\ }$$ Or you could opt for pure index notation (and Einstein summation convention) $$\eqalign{ F_{ij} &= W_{ip}W_{jp} \\ \grad{F_{ij}}{W_{k\ell}} &= {\a_{ipk\ell}W_{jp} + W_{ip}\a_{jpk\ell}} \\ &= \d_{ik}W_{j\ell} + W_{i\ell}\d_{jk} \\ }$$ Another common approach is to vectorize the matrix equation $$\eqalign{ dF &= dW\,W^T + W\,dW^T \\ \vecc{dF} &= \LR{W\otimes I}\vecc{dW} + \LR{I\otimes W}\vecc{dW^T} \\ &= \BR{\LR{W\otimes I} + \LR{I\otimes W}K}\,\vecc{dW} \\ \grad{\vecc{F}}{\vecc{W}} &= {\LR{W\otimes I} + \LR{I\otimes W}K} \\ }$$ where $K$ is the Commutation Matrix associated with Kronecker products.