Derivative w.r.t orthogonal matrix

1.1k Views Asked by At

Let $A$ be an orthogonal matrix with elements $a_{ij}$ so that $\sum_k a_{ik} a_{jk} = \delta_{ij}$. I'd like to know what is $\frac{ \partial a_{ij} }{ \partial a_{kl}}$. If $A$ was a generic matrix (with no constraints on its elements) then the answer would be $\delta_{ik} \delta_{jl}$. However, now we have the quadratic constraint on the matrix. What is the answer in this case?

2

There are 2 best solutions below

2
On BEST ANSWER

As the previous answer noted, there is no satisfying closed-form solution to this problem, such as there is in the case of a simple sym/skew constraint, or in the unconstrained case. However, there are several techniques that you can use to solve/simplify a problem which was initially formulated in terms of an orthogonally constrained matrix.

Calculate the differential of the matrix's orthogonal property $$\eqalign{ I &= A^TA \\ 0 &= A^TdA + dA^TA \\ &= A^TdA + (A^TdA)^T \\ A^TdA &= -(A^TdA)^T \\ }$$ Thus $(A^TdA)\,$ is skew-symmetric, which is often enough to solve most problems.

Another technique is to describe the matrix in terms of a vector of parameters (e.g. angles) and calculate derivatives wrt these parameters. As sketched out in the previous answer.

Yet another approach employs an unconstrained matrix $X$ to construct $A$.
First, construct the auxiliary matrix $B$ $$\eqalign{ B &= 2I + X - X^T \\ dB &= dX - dX^T &=\; -dB^T \\ }$$ Then use a Cayley transform to construct the orthogonal matrix $$\eqalign{ A &= B^{-1}B^{T} \\ dA &= B^{-1}dB^{T} + dB^{-1}B^{T} \\ &= B^{-1}dB^T - B^{-1}dB\,B^{-1}B^{T} \\ &= B^{-1}dB^T + B^{-1}dB^TA \\ &= B^{-1}dB^T(I+A) \\ &= B^{-1}(dX^T-dX)\,(I+A) \\ }$$ Introduce the fourth-order tensors $(\alpha,\beta,\lambda)$ defined by $$\eqalign{ \alpha &= \frac{\partial X}{\partial X} \qquad\implies\alpha_{ijk\ell} = \delta_{ik}\delta_{j\ell} \\ \beta &= \frac{\partial X^T}{\partial X} \quad\implies\beta_{ijk\ell} = \delta_{i\ell}\delta_{jk} \\ \\ \lambda &= B^{-1}\,\alpha\,(I+A^T) \\ \lambda_{ijk\ell} &= B^{-1}_{im}\,\alpha_{mjkn}(\delta_{n\ell}+A^T_{n\ell}) \\ &= B^{-1}_{im}\,\delta_{mk}\delta_{jn}(\delta_{\ell n}+A_{\ell n}) \\ &= B^{-1}_{ik}\,(\delta_{\ell j}+A_{\ell j}) \\ }$$ into that differential expression $$\eqalign{ dA &= B^{-1}\alpha(I+A^T):dX^T - B^{-1}\alpha(I+A^T):dX \\ &= \lambda:(\beta:dX) - \lambda:dX \\ &= \Big(\lambda:(\beta-\alpha)\Big):dX \\ \frac{\partial A}{\partial X} &= \lambda:(\beta-\alpha) \\ \frac{\partial A_{ij}}{\partial X_{k\ell}} &= \lambda_{ijpq}\;(\beta_{pqk\ell}-\alpha_{pqk\ell}) \;=\; (\lambda_{ij\ell k}-\lambda_{ijk\ell}) \\ }$$ yielding the gradient of the orthogonal matrix $A$ with respect to $X$.
Since $X$ is unconstrained, this quantity is useful for gradient-based algorithms.

A note about the properties of the tensors $(\alpha,\beta)$ in terms of the arbitrary matrices $(F,G,H)$. $$\eqalign{ F &= \alpha:F = F:\alpha \qquad&\big({\rm identity\,operator}\big) \\ G^T &= \beta:G = G:\beta \qquad&\big({\rm transpose\,operator}\big) \\ FGH^T &= \big(F\alpha H\big):G \qquad&\big({\rm rearrangement\,properties}\big) \\ &= \big(F\beta\,G\big):H \\ }$$ where the colon denotes the double-contraction product.

One last technique is to use vectorization to flatten the differential expression into a standard vector expression, i.e. $$\eqalign{ da,dx &= {\rm vec}(dA),\,{\rm vec}(dX) \\ da &= \Big((I+A^T)\otimes B^{-1}\Big)(K-I)\,dx \\ \frac{\partial a}{\partial x} &= \Big((I+A^T)\otimes B^{-1}\Big)(K-I) \\ }$$ where $\otimes$ is the Kronecker product and $K$ is the Commutation matrix associated with it.

The flattened result is completely analogous to the tensor result if you note the following:
$\quad K$ corresponds to $\beta$ (the transpose operator),
$\quad I$ corresponds to $\alpha$ (the identity operator),
$\quad (H\alpha G)$ corresponds to $(G\otimes H),\;$ and
     double-contraction corresponds to ordinary matrix multiplication.

3
On

Your question is non-sense but it is difficult to explain why. That follows is an example for $n=3$.

There is a one to one local diffeomorphism $f:K\in SK_3\rightarrow (I_3-K)(I_3+K)^{-1}\in SO_3$ where $SK_3$ is the set of skew symmetric matrices. In particular,

let $U=\begin{pmatrix}-18/23& 14/23& 3/23\\6/23& 3/23& 22/23\\13/23& 18/23& -6/23\end{pmatrix}\in O_3=f(\begin{pmatrix}0&-4&5\\4&0&-2\\-5&2&0\end{pmatrix})$.

Thus $U=[u_{i,j}]\in O_3$ depends on $3$ independent parameters. Locally, we can choose these parameters amongst the $(u_{i,j})$, but they cannot stand on the same row or column. We are interested by the derivative $\frac{ \partial u_{1,2} }{ \partial u_{1,1}}$. Then $u_{1,1}$ must be a parameter and $u_{1,2}$ must not (otherwise, the result is obvious).

We consider the $2$ following parametrizations

Choice 1. $u_{1,1},u_{2,2},u_{3,3}$. Then the derivative in our $U$ is $\frac{ \partial u_{1,2} }{ \partial u_{1,1}}\approx 1.46246$.

Choice 2. $u_{1,1},u_{2,3},u_{3,2}$. Then the derivative in our $U$ is $\frac{ \partial u_{1,2} }{ \partial u_{1,1}}\approx 0.666603$.

You can see that the result depends on the choice of the chosen local parametrization of $SO_3$.

EDIT. Answer to @Adam . Yes, in $SO(2)$, there is no problem because the parametrization contains only one parameter; for example, if $U(\theta)=\begin{pmatrix}\cos(\theta)&-\sin(\theta)\\\sin(\theta)&\cos(\theta)\end{pmatrix}$ with $u_{1,1}=f(u_{2,1})>0$, then $\dfrac{ \partial u_{1,1} }{ \partial u_{2,1}}=\dfrac{-\sin(\theta)}{\cos(\theta)}=\dfrac{-u_{2,1}}{\sqrt{1-u_{2,1}^2}}$. Yet, an element of $SO(3)$ depends on $3$ parameters and you must choose these parameters to calculate a partial derivative with respect to one of these parameters.