Hello I'm trying solve orthogonal procrustes problem in 3d with a help of quaternions.
Original problem is:
For matrix $A$ find orthogonal matrix $Q$ that $$||A-Q||_F =\min_{\Omega \in SO(3)} ||A-\Omega||_F $$
Observe that: $$||A-\Omega||_F^2 = \sum_{i=1}^3 ||(A-\Omega)e^i||_2^2$$
Where $e^i$ is i-th canonical basis vector.
First define few symbols: $$ q \in \mathbb{H}$$ $$e_0=1,e_1=i,e_2=j,e_3=k$$ $$q =x_0+x_1e_1+x_2e_2+x_3e_3$$ $$ q^* \text{is conjugate of }q$$ $$||q||^2 = qq^*$$ $$v_i = A_{1i}i+A_{2i}j+A_{3i}k$$
$$\partial = \sum_{i=0}^3 e_i \frac{\partial}{\partial x_i}=\sum_{i=0}^3 e_i \partial_i$$
$$F(q) = \sum_{i=1}^3 ||v_i-qe_iq^*||^2 $$ $$C(q) = qq^*-1$$
So the problem is to minimalize $F(q)$ while satisfy constraint $C(q)$
Now I would like use Lagrange multipliers, so find $\lambda$ and $q$ that:
$$\partial_i F(q) = \lambda \partial_i C(q)$$ for $i=0,1,2,3$ this can be rewriten to: $$\partial F(q) = \lambda \partial C(q)$$
Let simplify $F$
$$F(q) = \sum_{i=1}^3 (v_i-qe_iq^*)(v_i-qe_iq^*)^* = \sum_{i=1}^3 ||v_i||^2-||q||^4-(qe_iq^*v_i^*+v_iqe_i^*q^*)$$
We have to only minimalize $$F'(q) = \sum_{i=1}^3-(qe_iq^*v_i^*+v_iqe_i^*q^*)$$
But I don't know how to find $q$ and $\lambda$ that:
$$\partial F'(q) = \lambda \partial C(q)$$
I started differentiating $F'$ but I don't know how to simplify what I got.
$$\partial F'(q) = \sum_{i=1}^3-( \dot{\partial}\dot{q}e_iq^*v_i^*+\dot{\partial}qe_i\dot{q}^*v_i^*+\dot{\partial}v_i\dot{q}e_i^*q^*+\dot{\partial}v_iqe_i^*\dot{q}^*) $$
Here I explain what the dot notation means(using Einstein summation convention):
$$f(q) = f_i(q)e_i, g(q)=g_i(q)e_i$$ $$\partial (fg) = e_i \partial_i (f_je_jg_ke_k) = (\partial_i f_j)g_k e_ie_je_k + f_j(\partial_i g_k)e_ie_je_k = \dot{\partial}\dot{f}g + \dot{\partial}f\dot{g}$$
Since you mentioned you are learning Geometric Algebra, I will use that language as I'm more familiar with it. For the relation between quaternions and elements from geometric algebra such as rotors.
To simplify your objective, let's interpret it geometrically. $q e_i q^* v^*$ is the forward rotation of $e_i$ by $q$, followed by a dot product with $v_i$. The other term, $v_i q e_i^* q^* = v_i^* q^*e_iq$ is just the dot product of $v_i$ with the backward rotation of $e_i$ by $q$. So they are essentially equivalent and we only need one of the two, and so in GA we would write
$$ \langle R e_i \tilde{R} v_i \rangle = Re_i\tilde{R} \cdot v_i $$ where $R$ is the rotor we are looking for (analogous and in fact isomoprhic to $q$), and $\tilde{R}$ denotes the 'reverse' operation, which is analogous to conjugation. The angled brackets denote the operation of selecting the scalar part.
The manifold you are trying to optimize over (unit quaternions or rotors) is luckily much simpler than the manifold of special orthogonal matrices: we're dealing with a 3-sphere in 4D (with opposite points on the sphere representing the same rotation but not the same rotor/quaternion). While your aproach based on Lagrange multipliers could work, a simpler solution is to ignore the constraints and simply project the solution back to the sphere (i.e. normalize it).
So we can solve
$$ \partial_V \sum_i \langle V e_i V^{-1} v_i \rangle = 0 $$ where $V$ is the unnormalized version of $R$ or $q$. This object is known as a versor, and it performs a rotation and simultaneous uniform scaling. Notice that we need to use the inverse $V^{-1} = V \|V\|^{-2}$ in order to apply it correctly. For a rotor the inverse equals the reverse, so a reverse will suffice there.
I will now follow the derivation in (1), section 8.7.2. The differential evaluates to
$$ \begin{aligned} \sum_i \partial_V \langle v_i V e_i V^{-1} \rangle &= \sum_i \dot{\partial_V}[\dot{V} * (e_i V^{-1} v_i)] + \dot{\partial_V} [(\dot{V}^{-1} * (v_i V e_i)]\\ &=\sum_i e_i V^{-1} v_i - V^{-1} (v_i V e_i) V^{-1} \\ &= 2 V^{-1} \sum_i V e_i V^{-1} \wedge v_i \end{aligned} $$
So the optimal rotor satisfies $$ \sum_i R e_i \tilde{R} \wedge v_i = 0. $$ which says that the optimal rotation takes $e_i$ to a scalar multiple of $v_i$ (making the wedge product zero).
using the linear function $f[x] = \sum_i e_i( v_i \cdot x)$, we can rewrite this condition as: $$ \begin{aligned} -\sum_i (R e_i \tilde{R})\wedge v_i &= \partial_x \wedge (R (f[x]) \tilde{R}) \\ &= \frac{1}{2} \sum_i (\partial_x (v_i \cdot x) (R e_i \tilde{R}) - (R e_i \tilde{R})\partial_x(v_i \cdot x)) = 0 \end{aligned} $$ So we see that $R f[x] \tilde{R}$ is a symmetric function of $x$, which implies that $f$ itself is the $R$-rotated version of a symmetric function. My reference (1) states that we can take the symmetric part of $R f \tilde{R}$ out of the function, leaving the optimal rotation.
For further details do see [1], its an excellent text.
References:
[1] Geometric Algebra for Computer Science. Dorst, Fontijne, Mann.
[2] New geometric methods for computer vision. J. Lasenby, W. J. Fitzgerald, C. J. L. Doran and A. N. Lasenby.
[3] Bayesian inference and geometric algebra: an application to camera localization. C. Doran