Grateful if somebody could help me with the following. I am trying to find the gradient of the next expression:
$$f(a_1, a_2, a_3, a_4)=\Vert R*y-x \Vert $$
where $y$ and $x$ are known 4x1 column vectors and R is a 4x4 orthogonal rotation matrix given as an exponential
$$R = \exp(a_1*b_1+a_2*b_2+a_3*b_3+a_4*b_4), $$
where $a_1, a_2, a_3, $and $a_4$ are scalars and $b_1, b_2, b_3$, and $b_4$ are 4x4 matrices.
I want to calculate the gradient of $f(\cdot)$ with respect to $a_1, a_2, a_3$, and $a_4$. Any hint is appreciated.
Cristian
@ cristian, the essential task is to calculate the derivative of the function $\phi:t\rightarrow e^{tA+B}$. If $AB=BA$, then it is straightforward; $\phi'(t)=Ae^{tA+B}$. Else, it is much more difficult. If $X$ is a square matrix, then let $ad(X):H\in M_n\rightarrow XH-HX$ and $f:X\rightarrow e^X$. Then
$Df_X:H\in M_n\rightarrow e^X\sum_{k=0}^{\infty}\dfrac{(-ad(X))^k}{(k+1)!}H$.
Finally $\phi'(t)=e^{tA+B}\sum_{k=0}^{\infty}\dfrac{(-ad(tA+B))^k}{(k+1)!}A$.
EDIT: The choice of this form of the derivative of $\exp$ is due to its simplicity ; yet, there are other forms
$Df_x(H)=\sum_{n=0}^{\infty}\sum_{m=0}^{\infty}\dfrac{X^mHX^n}{(m+n+1)!}=\int_0^1e^{sX}He^{(1-s)X}ds$ and then
$\phi'(t)=\sum_{n=0}^{\infty}\sum_{m=0}^{\infty}\dfrac{(tA+B)^mA(tA+B)^n}{(m+n+1)!}=\int_0^1e^{s(tA+B)}Ae^{(1-s)(tA+B)}ds$.
I think that you cannot obtain a simpler form for $\phi'(t)$. Now, let $g:a_1\rightarrow ||Ry-x||$ ; then $g'(a_1)=\dfrac{1}{||Ry-x||}(Ry-x)^T(\dfrac{\partial R}{\partial a_1}y-x) $ where $\dfrac{\partial R}{\partial a_1}$ can be easily deduced from $\phi'(t)$. You say that $R$ is orthogonal ; then are the $(b_i)$ skew-symmetric matrices ? If yes and if the $(b_i)$ are known numeric matrices, then you can explicitly calculate the $4$ eigenvalues ($\pm i\alpha,\pm i\beta$) of the skew-symmetric matrix $\sum_ia_ib_i$. If you have Maple, you can calculate $\dfrac{\partial R}{\partial a_1}(a_1,a_2,a_3,a_4)$ for numeric values of the $(a_i)$ (time of calculation with $20$ significant digits: 13").
About $ad$, the Lie derivative: $ad(X)=X\bigotimes I-I\bigotimes X^T,(ad(X))^2=X^2\bigotimes I+I\bigotimes {X^2}^T-2X\bigotimes X^T,\cdots$. (if the vectorization of a matrix is formed by stacking its ROWS into a single column vector ; cf. http://en.wikipedia.org/wiki/Kronecker_product).