I'm solving a very large nonlinear minimization problem using a gradient descent method. The objective function $F$ consists of a set of terms that are calculated over a 2D triangle mesh; each vertex $i$ in the triangle mesh has an associated field value $z_i$ and the real-valued vector $\boldsymbol{z}$ is the argument that is searched to minimize the objective function (but the $(x,y)$ positions of the triangles are fixed: I'm finding $\arg\min_\boldsymbol{z} F(\boldsymbol{z})$).
One of the terms in the objective function promotes the smoothness of the mesh by operating over any pair of triangle that share an edge, and penalizing the triangle-pair for having normal-vectors whose dot-product is close to 0. For the sake of simplicity, let's assume that we have just the minimal triangle-mesh over which this operates (4 vertices that comprise 2 triangles and 5 total edges).
- Let triangle 1 be denoted by the points $\boldsymbol{p}_1 = (x_1, y_1, z_1)$, $\boldsymbol{p}_a = (x_a, y_a, z_a)$, and $\boldsymbol{p}_b = (x_b, y_b, z_b)$ in counter-clockwise order.
- Let triangle 2 be denoted by the points $\boldsymbol{p}_2 = (x_2, y_2, z_2)$, $\boldsymbol{p}_b$, and $\boldsymbol{p}_a$ in counter-clockwise order. (I.e., the shared edge between the triangles is edge $a$-$b$ and the non-shared vertices are points 1 and 2).
- Let $\boldsymbol{u}_{1a}$ = $p_a - p_1$, $\boldsymbol{u}_{1b}$ = $p_b - p_1$, $\boldsymbol{u}_{2a}$ = $p_a - p_2$, and $\boldsymbol{u}_{2b}$ = $p_b - p_2$
- Let $\boldsymbol{v}_1$ and $\boldsymbol{v}_2$ denote the vectors normal to the plane of each triangle: $\boldsymbol{v}_1 = \boldsymbol{u}_{1a} \times \boldsymbol{u}_{1b}$; $\boldsymbol{v}_2 = \boldsymbol{u}_{2b} \times \boldsymbol{u}_{2a}$
- The objective potential for these two triangles is $F(\boldsymbol{z}) = \frac{1}{2} - \frac{1}{2} \left( \frac{\boldsymbol{v}_1 \cdot \boldsymbol{v}_2}{|\boldsymbol{v}_1| |\boldsymbol{v}_2|} \right)^2$
The problem I'm having is finding a simple computable form of the gradient for this function in terms of $\boldsymbol{z} = (z_1, z_a, z_b, z_2)$. What is $\nabla_\boldsymbol{z} F$?
What I've tried:
- I have plugged the expression into Mathematica to get a technically-correct result, but it is long and unwieldly, and copy-pasting them into my minimizer results in poor performance. Nor does it offer me any intuition about the gradient the way a more human-readable version would.
- Consider the alternative objective function $G(\boldsymbol{z}) = - \frac{1}{2} \left( \boldsymbol{v}_1 \cdot \boldsymbol{v}_2 \right)^2$:
- $\nabla_\boldsymbol{z} G = -(\boldsymbol{v}_1 \cdot \boldsymbol{v}_2) (\mathbf{J}_{\boldsymbol{v}_1} \boldsymbol{v}_2 + \mathbf{J}_{\boldsymbol{v}_2} \boldsymbol{v}_1)$ where $\mathbf{J}_{u}$ is the Jacobian matrix of $u$:
- $\mathbf{J}_{\boldsymbol{v}_1} = \begin{pmatrix} y_b-y_a & x_a-x_b & 0 \\ y_1-y_b & x_b-x_1 & 0 \\ y_a-y_1 & x_1-x_a & 0 \\ 0& 0& 0 \end{pmatrix}$
- $\mathbf{J}_{\boldsymbol{v}_2} = \begin{pmatrix} 0 & 0 & 0 \\ y_b-y_2 & x_2-x_b & 0 \\ y_2-y_a & x_a-x_2 & 0 \\ y_a-y_b & x_b-x_a & 0 \end{pmatrix}$
- Unfortunately, $G$ is not a good objective for smoothness because the minimizer will contrive to produce infinite values, but, intuitively, $\nabla_\boldsymbol{z} F$ should resemble $\nabla_\boldsymbol{z} G$.
Any help solving $\nabla_\boldsymbol{z} F$ would be much appreciated.
Edit
After some thought, I came up with another approach; would appreciate if anyone can confirm that this is right.
- Let $\mathbf{J}_f(\boldsymbol{x})$ denote the Jacobian matrix of the vector-valued function $f$ in terms of the values in the vector $\boldsymbol{x}$; so if $\mathbf{J}_f(\boldsymbol{x})$ is size $n \times m$ then $m$ is the length of $\boldsymbol{x}$ and $n$ is the length of the vector output of $f$.
- As used above in gradient for $G$, we can also observe that $\mathbf{J}_{\left( \frac{\boldsymbol{v}_1 \cdot \boldsymbol{v}_2}{|\boldsymbol{v}_1| |\boldsymbol{v}_2|} \right)}(\boldsymbol{z}) = \mathbf{J}_{\frac{\boldsymbol{v}_1}{|\boldsymbol{v}_1|}}(\boldsymbol{z}) \cdot \frac{\boldsymbol{v}_2}{|\boldsymbol{v}_2|} + \mathbf{J}_{\frac{\boldsymbol{v}_2}{|\boldsymbol{v}_2|}}(\boldsymbol{z}) \cdot \frac{\boldsymbol{v}_1}{|\boldsymbol{v}_1|}$
- This can be further simplified: $\mathbf{J}_{\left( \frac{\boldsymbol{v}_1 \cdot \boldsymbol{v}_2}{|\boldsymbol{v}_1| |\boldsymbol{v}_2|} \right)}(\boldsymbol{z}) = \mathbf{J}_{\boldsymbol{v}_1}(\boldsymbol{z}) \cdot \mathbf{J}_{\frac{\boldsymbol{v}_1}{|\boldsymbol{v}_1|}}(\boldsymbol{v}_1)\cdot\frac{\boldsymbol{v}_2}{|\boldsymbol{v}_2|} + \mathbf{J}_{\boldsymbol{v}_2}(\boldsymbol{z})\cdot\mathbf{J}_{\frac{\boldsymbol{v}_2}{|\boldsymbol{v}_2|}}(\boldsymbol{v}_2) \cdot \frac{\boldsymbol{v}_1}{|\boldsymbol{v}_1|}$
- We already know $\mathbf{J}_{\boldsymbol{v}_1}(\boldsymbol{z})$ and $\mathbf{J}_{\boldsymbol{v}_2}(\boldsymbol{z})$ (above)
- $\mathbf{J}_{(\boldsymbol{v}/|\boldsymbol{v}|)}(\boldsymbol{v}) = \frac{1}{|\boldsymbol{v}|^{3}} (|\boldsymbol{v}|^2\mathbf{I} - \boldsymbol{v}^\intercal \boldsymbol{v})$
So we have:
$\mathbf{J}_{\left( \frac{\boldsymbol{v}_1 \cdot \boldsymbol{v}_2}{|\boldsymbol{v}_1| |\boldsymbol{v}_2|} \right)}(\boldsymbol{z}) = \frac{1}{|\boldsymbol{v}_1|^{3}} \mathbf{J}_{\boldsymbol{v}_1}(\boldsymbol{z}) \cdot (|\boldsymbol{v}_1|^2\mathbf{I} - {\boldsymbol{v}_1}^\intercal \boldsymbol{v}_1) \cdot \frac{\boldsymbol{v}_2}{|\boldsymbol{v}_2|} + \frac{1}{|\boldsymbol{v}_2|^{3}} \mathbf{J}_{\boldsymbol{v}_2}(\boldsymbol{z}) \cdot (|\boldsymbol{v}_2|^2\mathbf{I} - {\boldsymbol{v}_2}^\intercal \boldsymbol{v}_2) \cdot \frac{\boldsymbol{v}_1}{|\boldsymbol{v}_1|}$