Derivative of matrix vector product with respect to $3$D rotation matrices.

137 Views Asked by At

I have a class of probability densities that are indexed by $3$D rotation matrices. I am working on an estimation problem, so I want to find the fisher information, but that requires taking derivatives of the densities with respect to the rotation matrices.

Let $Q\in \mathbb{R}^{3\times 3}$ be a $3$D rotation matrix, and let $f(Q) = Y_1(Q)\cdots Y_n(Q) x \in R^{n}$ be a function that maps a rotation matrix to a vector in $\mathbb{R}^n$ where $Y_i(Q)$ are matrices that are a function of $Q$ and $x$ a constant vector, what is the derivative of $f$ with respect to $Q$:

$$\frac{\partial f}{\partial Q}$$.

Thank you!

2

There are 2 best solutions below

0
On BEST ANSWER

In general, if you have a function $F:V\to W$ between two normed vector spaces (say over the field $\Bbb{R}$), then given a point $\alpha\in V$, the derivative $DF_{\alpha}$ (also commonly denoted as $dF_{\alpha}$) is by definition a linear transformation $V\to W$; i.e $DF_{\alpha} \in \mathcal{L}(V,W)$, so you can evaluate on a vector $v\in V$ to get $DF_{\alpha}(v) \in W$.

Now with this introduction out of the way, all the familiar rules of differential calculus (eg chain rule, product rule etc) all work as usual. In your case, $V= M_{n\times n}(\Bbb{R})$ and $W = \Bbb{R}^n$, and so the product rule (differentiate each function successively and leave all others untouched, in the same order) implies that \begin{align} Df &= \sum_{k=1}^n Y_1\cdots Y_{k-1} \cdot(DY_k)\cdot Y_{k+1} \cdots Y_n \cdot x. \end{align} This is a slightly condensed notation, where we do not say where things are being evaluated. If we want to be slightly more explicit, then for all $Q,\xi\in V$ we have \begin{align} Df_Q[\xi] &= \sum_{k=1}^n Y_1(Q)\cdots Y_{k-1} \cdot(DY_k)_Q[\xi]\cdot Y_{k+1}(Q) \cdots Y_n(Q) \cdot x \quad \in W \end{align} Without further information, this is about as simple as it can get.

This is a huge amount of information, because $Df_Q\in \mathcal{L}(V,W) = \mathcal{L}(M_{n\times n}(\Bbb{R}), \Bbb{R}^n)$ is a linear transformation between spaces of large dimension, and if you were to try to represent this linear transformation as a matrix, it would be very ugly very quickly (any matrix representation of this linear map will be of size $n\times n^2$), so unless you really need to write in matrix form, I would avoid it.

0
On

$ \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} \def\A{{\cal A}}\def\B{{\cal B}}\def\C{{\cal C}} $Introducing the partial product notation $$\eqalign{ P_{i,j} &= Y_{i}\;Y_{i+1}\cdots Y_{j}\qquad&(i\le j) \\ &= I &(i>j) \\ }$$ the differential and gradient of the function can be calculated as $$\eqalign{ df &= \sum_{k=1}^n P_{1,k-1}\:dY_k\:(P_{k+1,n}\:x) \\ &= \sum_{k=1}^n \Big[P_{1,k-1}\star(P_{k+1,n}\:x)\Big]:dY_k \\ &= \sum_{k=1}^n\Big[P_{1,k-1}\star(P_{k+1,n}\:x)\Big]:\gradLR{Y_k}{Q}:dQ \\ \grad{f}{Q} &= \sum_{k=1}^n\Big[P_{1,k-1}\star(P_{k+1,n}\:x)\Big]:\gradLR{Y_k}{Q} \\ }$$ where $(\star)$ denotes the tensor product and $(:)$ is the double-dot product $$\eqalign{ &\B = A\star X \qiq \B_{ijkl} = A_{ij} X_{kl} \\ &G=M:\B \qiq G_{kl} = \sum_{i}\sum_{j} M_{ij}\B_{ijkl} \\ }$$

Since you didn't specified the $Y_k$ functions, I will assume that you already know how to calculate those tensor-valued gradients $\gradLR{Y_k}{Q}$