I am dealing with an optimization problem where I need to find an optimal rotation matrix. Let me first formulate the problem.
Input:
- An initial rotation matrix $M\in SO(3)\subset\mathbb{R}^{3\times 3}$
- A 3D point set $P$
- A corresponding ground truth point set $G$
The orthogonal matrix is used to rotate the points by matrix multiplication, along with other non-linear transformations. Suppose the transformations as a whole are denoted by $f_M(\cdot)$. Then the transformed point set is $P'=f_M(P)$. The problem is to find an optimal rotation matrix $M$ such that the transformed points $P'$ are closest to a ground truth set of points $G$, i.e.
$$ \min_{M}\sum_{i}\|G_i-f_M(P_i)\|_2^2 $$
where $P_i$ is the $i$-th point and $G_i$ is its corresponding ground truth point.
Since the problem is nonlinear, I try to solve it with gradient descent. In each iteration, the following steps are adopted:
Compute loss $J=\sum_{i}\|G_i-f_M(P_i)\|_2^2$.
Compute gradient $\partial J/\partial M$ and set $\Delta M=-\alpha(\partial J/\partial M)$ where $\alpha$ is the learning rate.
Update $M\leftarrow M+\Delta M$.
The problem is, I need $M$ to be a rotation matrix, i.e. $M\in SO(3)$. But the updating formula above is very likely to violate this constraint. I can think of two ways of solving this:
After each update, project $M$ to $SO(3)$.
Before each iteration, parametrize $M$ with parameters that naturally represent $SO(3)$, and use gradient descent on these parameters instead.
But I am not sure how to computationally obtain the projection to $SO(3)$ or the parametrization.
One possible way to do this would be to use quaternions.
If $q$ is any nonzero quaternion, then $q$ acts on the space $\Im=\{ai+bj+ck|a,b,c\in \mathbb{R}\}$ of pure-imaginary quaternions by conjugation: we send any $x \in \Im$ to $qxq^{-1}$. This map turns out to be a rotation of the three-dimensional space $\Im$, so any nonzero quaternion yields some element of $SO(3)$ (with two distinct quaternions yielding the same element if and only if they are nonzero real multiples of each other).
Since the space $\mathbb{H}$ of quaternions is just a vector space, you can do your gradient descent in that space. The only issue is that the map from $\mathbb{H}$ to $SO(3)$ is discontinuous at $0$, so you want to arrange your gradient descent never to get too close to $0$. Since multiplying a quaternion by a nonzero real number doesn't change its corresponding rotation, you can probably do this by just renormalizing your quaternion after every step (or maybe every few steps) and choosing a small enough step size that no one step will ever take you very close to $0$ when starting from a unit quaternion.