I am attempting to derive the analytical formula for the Hessian matrix of a the second derivatives of the value of an angle in terms of the (9) coordinates of the 3 3D points that define it. While I have derived a formula for this (given below), when I attempt to validate it numerically, I get incorrect answers; e.g., non-symmetric matrices. I suspect that I have misunderstood or misapplied some principle of vector calculus in coming up with these formulae and would appreciate it if anyone can either (a) find the error, (b) provide an alternate formula, or (c) point me toward a derivation elsewhere.
Consider the following three points:
$a = (x_a, y_a, z_a)\\ b = (x_b, y_b, z_b)\\ c = (x_c, y_c, z_c)$
Let $\theta = \arccos\left( \frac{b - a}{||b - a||} \cdot \frac{c - a}{||c - a||} \right)$ be the angle between vectors $\mathbf{v}_{ab} = b-a$ and $\mathbf{v}_{ac} = c-a$. Additionally, let $\mathbf{u}_{ab} = \frac{b - a}{||b - a||}$ and $\mathbf{u}_{ac} = \frac{c - a}{||c - a||}$.
I am performing a minimization for which I need a closed form of both the gradient vector and Hessian matrix of $\theta$ in terms of the coordinates of the points $a$, $b$, and $c$. The details of the problem are not important aside from the fact that estimation is not an option due to the size of the problem. Formally, the formula I need are:
$\nabla \theta = \begin{pmatrix} \partial\theta/\partial x_a\\ \partial\theta/\partial y_a\\ \partial\theta/\partial z_a\\ \vdots\\ \partial\theta/\partial z_c \end{pmatrix}\;$ and $\;\mathbf{H}(\theta) = \begin{pmatrix} \partial^2 \theta / \partial x_a^2 & \partial^2 \theta / \partial x_a \partial y_a & \cdots & \partial^2 \theta / \partial x_a \partial z_c \\ \partial^2 \theta / \partial y_a \partial x_a & \partial^2 \theta / \partial y_a^2 & \cdots & \partial^2 \theta / \partial y_a \partial z_c \\ \vdots & \vdots & \ddots & \vdots \\ \partial^2 \theta / \partial z_c \partial x_z & \partial^2 \theta / \partial z_c \partial y_a & \cdots & \partial^2 \theta / \partial z_c^2 \end{pmatrix}$
I have already found the gradient analytically and tested it numerically; I define it as follows:
$\nabla \theta = \begin{pmatrix} \theta_a \\ \theta_b \\ \theta_c \end{pmatrix}$ where $\theta_a$, $\theta_b$, and $\theta_c$ are column vectors representing $\nabla \theta$ in terms of points $a$, $b$, and $c$:
$\theta_a = \begin{pmatrix} \partial \theta / \partial x_a \\ \partial \theta / \partial y_a \\ \partial \theta / \partial z_a \\\end{pmatrix} = -(\theta_b + \theta_c) \\ \theta_b = \begin{pmatrix} \partial \theta / \partial x_b \\ \partial \theta / \partial y_b \\ \partial \theta / \partial z_b \\\end{pmatrix} = \frac{1}{||c-b||}(\mathbf{u}_{ab} \cot\theta - \mathbf{u}_{ac} \csc\theta)\\ \theta_c = \begin{pmatrix} \partial \theta / \partial x_c \\ \partial \theta / \partial y_c \\ \partial \theta / \partial z_c \\\end{pmatrix} = \frac{1}{||c-b||}(\mathbf{u}_{ac} \cot\theta - \mathbf{u}_{ab} \csc\theta) $
I would like to derive the Hessian using this gradient as a starting place; i.e., if I could find $\mathbf{J}(\theta_b)$ and $\mathbf{J}(\theta_c)$ (where $\mathbf{J}(f)$ is the Jacobian matrix of $f$) in terms of the coordinates $x_a, y_a ... z_c$, expressing the Hessian would be a trivial matter of packing the Jacobians into a larger matrix:
$\mathbf{H}(\theta) = \begin{pmatrix} \mathbf{J}(\theta_a) & \mathbf{J}(\theta_b) & \mathbf{J}(\theta_c) \end{pmatrix}$
To further simplify, I can split the Jacobians further into the following $3\times 3$ sub-matrices and express the Hessian using them:
$\mathbf{H}(\theta) = \begin{pmatrix} \theta_{aa} & \theta_{ba} & \theta_{ca} \\ \theta_{ab} & \theta_{bb} & \theta_{cb} \\ \theta_{ac} & \theta_{bc} & \theta_{cc} \end{pmatrix}$ where
$\theta_{ij} = \begin{pmatrix} \partial \theta_i / \partial x_j & \partial \theta_i / \partial y_j & \partial \theta_i / \partial z_j \end{pmatrix} = \begin{pmatrix} \partial^2 \theta / \partial x_i \partial x_j & \partial^2 \theta / \partial x_i \partial y_j & \partial^2 \theta / \partial x_i \partial z_j \\ \partial^2 \theta / \partial y_i \partial x_j & \partial^2 \theta / \partial y_i \partial y_j & \partial^2 \theta / \partial y_i \partial z_j \\ \partial^2 \theta / \partial z_i \partial x_j & \partial^2 \theta / \partial z_i \partial y_j & \partial^2 \theta / \partial z_i \partial z_j \\ \end{pmatrix}$.
Note that $\theta_{ij} = \theta_{ji}^\intercal$.
I now go about deriving the functions $\theta_{ij}$, starting with $\theta_{ba}$, $\theta_{bb}$, $\theta_{bc}$, $\theta_{ca}$, and $\theta_{cc}$. Using these, the remainder will be trivial to derive. I formulate these matrices by considering the dot product of a column vector, $\theta_i$ with $\nabla_j^\intercal = \begin{pmatrix} \partial/\partial x_j & \partial/\partial y_j & \partial/\partial z_j \end{pmatrix}$: $\theta_{ij} = \theta_i \cdot \nabla_j^\intercal$. To simplify the derivation slightly, note that the following can be easily defined:
$\mathbf{J}_i(\mathbf{u}_{ij}) = \begin{pmatrix} \partial x_{\mathbf{u}_{ij}} / \partial x_i & \partial x_{\mathbf{u}_{ij}} / \partial y_i & \partial x_{\mathbf{u}_{ij}} / \partial z_i \\ \partial y_{\mathbf{u}_{ij}} / \partial x_i & \partial y_{\mathbf{u}_{ij}} / \partial y_i & \partial y_{\mathbf{u}_{ij}} / \partial z_i \\ \partial z_{\mathbf{u}_{ij}} / \partial x_i & \partial z_{\mathbf{u}_{ij}} / \partial y_i & \partial z_{\mathbf{u}_{ij}} / \partial z_i \\ \end{pmatrix} = \mathbf{v}_{ij} \cdot \mathbf{v}_{ij}^\intercal - ||\mathbf{v}_{ij}||^2 \mathbf{I}$.
Additionally, if we let $\gamma(i,j) = 1/||\mathbf{v}_{ij}||$ (this cleans up the notation slightly), then we have:
$\nabla_i \gamma(i,j) = \begin{pmatrix} \partial \gamma(i,j) / \partial x_i \\ \partial \gamma(i,j) / \partial y_i \\ \partial \gamma(i,j) / \partial z_i \end{pmatrix} = \mathbf{v}_{ij} / ||\mathbf{v}_{ij}||^3$.
We can now begin the derivation; using the notation defined above, we can start with $\theta_{ba}$:
$\begin{align} \theta_{ba} &= \theta_b \cdot \nabla_a^\intercal \\ &= (\gamma(b,c) \mathbf{u}_{ab}\cot\theta - \gamma(b,c) \mathbf{u}_{ac}\csc\theta) \cdot \nabla_a^\intercal \\ &= (\gamma(b,c) \mathbf{u}_{ab}\cot\theta)\cdot \nabla_a^\intercal - (\gamma(b,c) \mathbf{u}_{ac}\csc\theta) \cdot \nabla_a^\intercal \\ &= \gamma(b,c)( (\cot\theta\, \mathbf{J}_a(\mathbf{u}_{ab}) - \csc^2\theta\, (\mathbf{u}_{ab} \cdot \theta_a^\intercal)) - (\csc\theta\, \mathbf{J}_a(\mathbf{u}_{ac}) - \csc\theta \cot\theta\, (\mathbf{u}_{ac} \cdot \theta_a^\intercal)) \end{align}$
The remainder are given below, but note that even looking just at $\theta_{ba}$, the formula given here does not produce answers that match numerical derivations of the Hessian.
$\begin{align} \theta_{bb} &= \theta_b \cdot \nabla_b^\intercal = (\gamma(b,c) \mathbf{u}_{ab}\cot\theta - \gamma(b,c) \mathbf{u}_{ac}\csc\theta) \cdot \nabla_b^\intercal \\ &= \cot\theta\,(\mathbf{u}_{ab} \cdot \nabla_b\gamma(b,c)^\intercal) + \gamma(b,c)\cot\theta\,\mathbf{J}_b(\mathbf{u}_{ab}) - \gamma(b,c)\csc^2\theta\,(\mathbf{u}_{ab} \cdot \theta_b^\intercal) - \csc\theta\,(\mathbf{u}_{ac} \cdot \nabla_b\gamma(b,c)^\intercal) + \gamma(b,c)\csc\theta\cot\theta\,(\mathbf{u}_{ac}\cdot\theta_b^\intercal) \end{align}$
$\begin{align} \theta_{bc} &= \theta_b \cdot \nabla_c^\intercal = (\gamma(b,c) \mathbf{u}_{ab}\cot\theta - \gamma(b,c) \mathbf{u}_{ac}\csc\theta) \cdot \nabla_c^\intercal \\ &= \cot\theta\,(\mathbf{u}_{ab} \cdot \nabla_c\gamma(b,c)^\intercal) - \gamma(b,c)\csc^2\theta\,(\mathbf{u}_{ab} \cdot \theta_c^\intercal) - \csc\theta\,(\mathbf{u}_{ac} \cdot \nabla_\gamma(b,c)^\intercal) - \gamma(b,c)\csc\theta\,\mathbf{J}_c(\mathbf{u}_{ac}) + \gamma(b,c)\csc\theta\cot\theta\,(\mathbf{u}_{ac}\cdot\theta_c^\intercal) \end{align}$
$\begin{align} \theta_{ca} &= \theta_c \cdot \nabla_a^\intercal = (\gamma(b,c)\cot\theta\,\mathbf{u}_{ac} + \gamma(b,c)\csc\theta\,\mathbf{u}_{ab}) \cdot \nabla_a^\intercal \\ &= \gamma(b,c)( \cot\theta\,\mathbf{J}_a(\mathbf{u}_{ac}) - \csc^2\theta\,(\mathbf{u}_{ac} \cdot \theta_a^\intercal) - \csc\theta\,\mathbf{J}_a(\mathbf{u}_{ab}) + \csc\theta\cot\theta\,(\mathbf{u}_{ab} \cdot \theta_a^\intercal) \end{align}$
$\begin{align} \theta_{cc} &= \theta_c \cdot \nabla_c^\intercal = (\gamma(b,c)\cot\theta\,\mathbf{u}_{ac} + \gamma(b,c)\csc\theta\,\mathbf{u}_{ab}) \cdot \nabla_c^\intercal \\ &= \cot\theta\,(\mathbf{u}_{ac} \cdot \nabla_c\gamma(b,c)^\intercal) + \gamma(b,c)\cot\theta\,\mathbf{J}_c(\mathbf{u}_{ac}) - \gamma(b,c)\csc^2\theta\, (\mathbf{u}_{ac} \cdot \theta_c^\intercal) - \csc\theta\,(\mathbf{u}_{ab} \cdot \nabla_c\gamma(b,c)^\intercal) + \gamma(b,c)\cot\theta\csc\theta\,(\mathbf{u}_{ab} \cdot \theta_c^\intercal) \end{align}$
Many years ago I had similar tasks in computer geometry. The formula is too complicated and it is too easy to miss it. So I decided to develop a C++ template class that holds some quantity --- a number --- together with its first two derivatives with respect to some real parameters. (The number of the real parameters is a template parameters) So the class contains a value, a gradient vector and a Hessian matix. Then I implemented the usual arithmetic operations and functions (+, *, /, exp, log, atan etc.) working with that class. Having this code, we just need to build the formula (e.g. compute the error in the surface fitted) and we will have the derivatives wrt. the real parameters --- everything we need for Newton-Raphson.