Gradient of QZ decomposition

194 Views Asked by At

Let $A$ and $B$ be an $m \times n$ matrix of rank $ k_1 \le \min(m,n) $ and $ k_2 \le \min(m,n) $. Then the QZ decomposition or the generalized Schur decomposition is $A = USV^T$ and $B = UTV^T $, where:

  • $U$ and $V$ are unitary matrices.

  • $S$ and $T$ are upper triangular matrices.

We define a function $ f$ which takes $U$, $S$, $T$ and $V$ as input and returns the sum of all the elements of all the matrices. I am interested in finding the gradient of $f$ with respect to $A$ and $B$.


I found the below tricks to be useful by reading some posts here and some papers, but couldn't solve them completely. \begin{align*} dA &= dUSV^{T} + UdSV^{T} + USdV^{T} \newline U^{T}dAV &= U^{T}dUS + dS + SdV^{T}V \newline U^{T}dAVS^{-1} &= U^{T}dU + dSS^{-1} + SdV^{T}VS^{-1} \newline \end{align*} Since $U^{T}dU$ and $dV^{T}V$ are skew symmetric, we get \begin{align*} U^{T}dAVS^{-1} - dSS^{-1} - SdV^{T}VS^{-1} = -(U^{T}dAVS^{-1} - dSS^{-1} - SdV^{T}VS^{-1})^T \end{align*} which simplifies to \begin{align*} &U^{T}dAVS^{-1} - SdV^{T}VS^{-1} + (U^{T}dAVS^{-1} - SdV^{T}VS^{-1})^T = dSS^{-1} + (dSS^{-1} )^T \newline & sym(C) = sym(dS^{-1}) \end{align*} where $C = U^{T}dAVS^{-1} - SdV^{T}VS^{-1}$. Since, $dSS^{-1}$ is upper triangular, we can write $dS$ as $dS = (sym(C) \circ E^T)S$ where $\circ$ is the Hadamard product and $E$ is- $$ e_{ij}= \begin{cases} 0,& i < j\\ 1, & i=j\\ 2,& i > j \end{cases} $$

Let derivate of output with respect to $A$, $B$, $U$, $V$, $S$ and $T$ be $\bar{A}$, $\bar{B}$, $\bar{U}$, $\bar{V}$, $\bar{S}$ and $\bar{T}$. Then $$Tr(\bar{A}dA) = Tr(\bar{U}dU) + Tr(\bar{S}dS) + Tr(\bar{V}dV) $$ If I substitute $dS$ in the above formula, then I will have RHS in terms of $dV$ and $dA$. I am trying to find a way to eliminate $dV$ so that RHS is just in terms of $dA$ and then I can equate and find $\bar{A}$.

  1. In the above equation $U^{T}dU$ and $dV^{T}V$ are skew symmetric and $dS$ is upper triangular
  2. I am trying to eliminate $dV$ or $dU$ so that $dS$ and $dT$ depends on $dA$ and $dB$ and then I can find $dU$ and $dV$.
  3. Any possible way to simplify $C$?

I am not sure how to jointly solve for $dU$ and $dV$ such that the answer uses $dA$ and $dB$.

Some references-

Gradient of $A \mapsto \sigma_i (A)$

https://arxiv.org/pdf/1509.07838.pdf

https://j-towns.github.io/papers/qr-derivative.pdf

Edit 1: The main purpose of the question is to find gradients for backpropagation through QZ decomposition.

1

There are 1 best solutions below

3
On

$ \def\l{\lambda}\def\s{\sigma}\def\e{\varepsilon} \def\o{{\tt1}}\def\p{\partial} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\deriv#1#2{\frac{d #1}{d #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $Here is a partial answer, assuming that the change in $A$ is in the direction of $B$ so that $$\eqalign{ dA &= (B-A)\:d\l \qiq \c{dU = dV = 0} \\ }$$ Let $J_A$ denote an all-ones matrix the same size as $A$, then the cost function is $$\eqalign{ \phi &= J_S:S + J_T:T + J_U:U + J_V:V \\ }$$ Differentiating wrt to $A$ simplifies the problem since $\:dB=0\implies\c{dT=0}$ $$\eqalign{ d\phi &= J_S:dS + J_T:\c{dT} + J_U:\c{dU} + J_V:\c{dV} \\ &= J_S:dS \\ &= J_S:(U^TdA\,V) \\ &= (UJ_SV^T):dA \\ &= (UJ_SV^T):(B-A)\,d\l \\ \grad{\phi}{A} &= UJ_SV^T \\ \deriv{\phi}{\l} &= (UJ_SV^T):(B-A) \\ }$$


In the above, a colon is used to denote the Frobenius product, which is a concise notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \|A\|^2_F \\ }$$ The properties of the underlying trace function allow the terms in such a product to be rearranged in many different ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:\LR{AB} &= \LR{CB^T}:A \\&= \LR{A^TC}:B \\ }$$