What is the derivative of $||BB^T - Q||_2$ wrt matrix B?

35 Views Asked by At

I'm trying to optimize a function including the following expression $||BB^T - Q||_2$ where both $B$ and $Q \in \mathbb{R}^{n \times n}$ (other conditions on $B$ are enforced by regularization). One way to approach this is to form a generalized energy function and perform gradient descent, however I'm not familiar with using derivatives wrt. a matrix (here $B$) and the Matrix Cookbook doesn't seem to offer clear guidance on this case.

Is this even a sensible approach? If so, is there any advice for how to approach this and specifically the $||BB^T - Q||_2$ term?

Thank you.

1

There are 1 best solutions below

3
On

$ \def\s{\sigma} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vecc#1{\op{vec}\LR{#1}} \def\diag#1{\op{diag}\LR{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\spec#1{\left\| #1 \right\|_S} \def\two#1{\left\| #1 \right\|_2} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} $The Frobenius product $(:)$ is very convenient $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \frob{A}^2 \qquad \{ {\rm Frobenius\;norm} \}\\ A:B &= B:A \;=\; B^T:A^T \\ \LR{AB}:C &= A:\LR{CB^T} = B:\LR{A^TC} \\ }$$ If $\two{\cdot}$ denotes the Frobenius norm, then the gradient can be calculated as $$\eqalign{ \phi &= \frob{BB^T-Q} \\ \phi^2 &= \frob{BB^T-Q}^2 \\ &= \LR{BB^T-Q}:\LR{BB^T-Q} \\ 2\phi\:d\phi &= 2\LR{BB^T-Q}:\LR{dB\,B^T+B\:dB^T} \\ &= 4\LR{BB^T-Q}B:dB \\ d\phi &= 2\phi^{-1}\LR{BB^T-Q}B:dB \\ \grad{\phi}{B} &= 2\phi^{-1}\LR{BB^T-Q}B \\ \\ }$$ However, if $\two{\cdot}$ denotes the Spectral norm, then one must use the SVD decomposition $$\eqalign{ X &= \sum_{k=1}^n \s_k u_k v_k^T &\qiq \spec{X} = \max_k \LR{\s_k} \\ \phi &= \spec{X} = \s_1 &\qiq d\phi = {u_1 v_1^T}:dX \\ }$$ Then substitute for $X$ $$\eqalign{ X &= {BB^T-Q} \\ d\phi &= \LR{u_1 v_1^T}:\LR{dB\,B^T + B\:dB^T} \\ &= \LR{u_1 v_1^T + v_1 u_1^T}B:dB \\ \grad{\phi}{B} &= \LR{u_1 v_1^T + v_1 u_1^T}B \\ }$$