Derivatives of Trace of $\operatorname{SU}(2)$ Matrices by Another $\operatorname{SU}(2)$

98 Views Asked by At

In short, my question is how do I take derivatives of functions that look like, $$\mathrm{Tr}(UM)$$ or $$\mathrm{Tr}(U^{\dagger}M)$$ by $U$ where $U$ and $M$ are $\mathrm{SU}(2)$ matrices. The longer story is that I am trying to derive force terms that are used in hybrid Monte Carlo simulations. A lot of that is physics so I don't want to post most of that here. With that being said, if this is an example of an XY problem, I would love some input. Most papers that I have come across haven't done a very good job of explaining exactly how these terms are derived. So, in longer format, the derivative that I am trying to take is of the following function: $$S = \frac{4}{g^2}\sum_{x,\mu < \nu}\left( 1 -\frac{1}{2}\mathrm{Tr}\left(U_{\mu}(x)U_{\nu}(x+a\hat{e}_{\mu})U_{\mu}^{\dagger}(x+a\hat{e}_{\nu})U_{\nu}^{\dagger}(x)\right) \right) $$ with respect to $A_{\mu}(x)$ where $A_{\mu}(x) = \sum_aA_{\mu}^a(x)\sigma^a$ where $\sigma^a$ are the Pauli matrices so that $A_{\mu}(x)$ is Hermitian. The matrices $U$ are defined by the $A$ matrices as $U_{\mu}(x) = e^{igA_{\mu}(x)}$. $\mu$ and $\nu$ are indices that run from $1$ to $4$. Of course what I'm really trying to find is $$\frac{\partial S}{\partial A_{\mu}(x)}$$ but that simplfies with the chain rule to $$ \frac{\partial U_{\mu}(x)}{\partial A_{\mu}(x)} \frac{\partial S}{\partial U_{\mu}(x)} = igU_{\mu}(x)\frac{\partial S}{\partial U_{\mu}(x)} $$.

1

There are 1 best solutions below

2
On

$ \def\R#1{{\mathbb R}^{#1}} \def\o{{\tt1}} \def\s{\sigma} \def\mbrace#1{\left\lbrace\begin{array}{r}#1\end{array}\right\rbrace} \def\m#1{\left[\begin{array}{r|r}#1\end{array}\right]} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\qif{\quad\iff\quad} \def\frob#1{\left\| #1 \right\|_F} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} $Let $c\in\R{4}$ be an arbitrary vector and construct the associated quaternion $$\eqalign{ c &= \mbrace{c_0\\c_1\\c_2\\c_3} \qiq C &= \m{c_0+ic_1 & c_2+ic_3\\\hline -c_2+ic_3 & c_0-ic_1} \\ }$$ which can also be written with respect to a basis consisting of Pauli matrices $\{\s_k\}$ $$\eqalign{ C &= s_0\,\s_0 + s_1\,\s_1 + s_2\,\s_2 + s_2\,\s_2 \\ s_0 &= c_0, \quad{\rm but}\quad s_k \ne c_k\;\:\{{\rm for}\;k=1,2,3\} \\ }$$ Normalizing the vector generates a unit quaternion, i.e. an $SU(2)$ matrix $$\eqalign{ &u = \frac{c}{\|c\|} = \mbrace{u_0\\u_1\\u_2\\u_3} \qiq U = \m{u_0+iu_1 & u_2+iu_3\\\hline-u_2+iu_3 & u_0-iu_1} \\ &U^\dagger U = I, \qquad \det(U) = (u_0^2+u_1^2)+(u_2^2+u_3^2) = \o \\ }$$ Let's use a colon to denote the matrix inner product $$\eqalign{ C:B &= \sum_{i=1}^m\sum_{j=1}^n C_{ij}B_{ij} \;=\; \trace{C^TB} \\ C:B &= B:C \qquad \{ {\rm \,commutes\,} \} \\ C^*:C &= \frob{C}^2 \qquad \{ {\rm \,Frobenius\;norm\,} \} \\ }$$ The vector and matrix representations of two arbitrary quaternions satisfy $$\eqalign{ C^*:B \;=\; 2\,c:b \;=\; 2\,b:c \;=\; B^*:C \\ }$$ and therefore $$\eqalign{ \frob{C} &= \sqrt{2}\:\frob{c} \qiq U = \frac{C}{\frob{c}} = \sqrt{2}\,\LR{\frac{C}{\frob{C}}} \\ \frob{U} &= \sqrt{2} \\ }$$ Using the quotient rule, the differential of the unit vector can be calculated as $$\eqalign{ du &= \LR{\frac{I-uu^T}{\|c\|}} dc \\ }$$ This means that the updated vector $\LR{u_+=u+du}$ remains a unit vector and therefore corresponds to an updated $SU(2)$ matrix.

Finally, if you define the quaternion $$B=M^\dagger \qiq B^*=M^T$$ then your first function can be dispatched as follows $$\eqalign{ \phi &= \trace{MU} \\ &= B^*:U \\ &= 2\,b:u \qquad \{ {\rm vector\;form} \} \\ \\ d\phi &= 2\,b : du \\ &= 2\,b : \LR{\frac{I-uu^T}{\|c\|}}dc \\ &= 2 \LR{\frac{I-uu^T}{\|c\|}}b : dc \\ \\ \grad{\phi}{c} &= 2 \LR{\frac{I-uu^T}{\|c\|}}b \\ }$$ This is the gradient of the function with respect to an unconstrained $\R{4}$ vector, which is not what you asked for but is what you need if you plan to do any sort of optimization.

Once the optimal vector is found, it is trivial to recover the corresponding $\,U\in SU(2)$

However, if you really, really want the gradient with respect to $U\,$ then here it is $$\eqalign{ \grad{\phi}{U} &= B^* \\ }$$ The problem with this gradient is that it accommodates movement in any direction, including those which destroy the unitarity of $U$.

But if you're careful to only choose directions which preserve unitarity (don't ask me how), then this gradient could be useful.