Understanding Newton method when optimizing cost function depending on a triangle.

Question

Understanding Newton method when optimizing cost function depending on a triangle.

256 Views Asked by Bumbble Comm At 11 May 2026 - 7:15

I have the following function defined from $\mathbb{R}^9 \to \mathbb{R}$

\begin{equation} f(x) = f(x_1,x_2,x_3) = \frac{1}{2}(\left\langle n,e_3 \right\rangle - \lVert n \rVert)^2 \end{equation}

Basically it's a function of the vertices of a given triangle and we also have $e_3 = (0,0,1)^T$ and $n = (x_2 - x_1)\times (x_3 - x_1)$. Computing the gradient w.r.t. $x$ leads me to

$$ \nabla f = \left(\left\langle \frac{n}{\lVert n \rVert} ,e_3 \right\rangle - 1\right)^2 \begin{pmatrix} (x_2 - x_3) \times n \\ (x_3 - x_1) \times n \\ (x_1 - x_2) \times n \end{pmatrix} $$

I'm trying to give a rigorous interpretation of that gradient, if I set that to $0$ I can get the minumum (though there's a an issue with $\lVert n \rVert$ term. I've implemented such gradient in matlab to minimize a function and this is what I'm getting:

The algorithm matlab is using is the Newton method. Not sure if that's visible but basically it seems the triangle has been rotated in order to be parallel to the xy plane.

However if would use gradient descent instead of Newton method, If I'm reading correctly the gradient it seems to me the three vertices would potentially be shrank until the triangle would degenerate to a single point.

Why is the Newton method attempting to rotate instead of warping the triangle?

Original Q&A

There are 3 best solutions below

Bumbble Comm On 28 Aug 2018 - 3:40

Why is the Newton method attempting to rotate instead of warping the triangle?

I'm not sure what you mean by "warping the triangle", but...

Clearly your cost function attains its local minima of $0$ at exactly the points $(x_1, x_2, x_3)$ for which the $x$ and $y$ components of $n$ are zero. So any solution necessarily will have all of $x_1, x_2, x_3$ all lying in some horizontal plane $z = C$. The Newton optimizer is simply converging to one of these solutions.

You of course have the issues that

As indicated above, your optimum is (highly) non-unique. You could remedy this by introducing some sort of regularization, e.g. a similarity constraint as suggested by @Christian Blatter in the comments.
Your cost function is not differentiable on the subspace $n=0$. It appears as if the Newton optimizer is currently converging to a valid minimum away from this subspace, but you would probably be safer using a globally differentiable cost function. Note that, alternatively, the above regularization suggestion would also remedy this issue by fixing $\| n \| > 0$.

but I believe that first paragraph answers your explicit question.

Bumbble Comm On 01 Sep 2018 - 2:36

We can figure out the behavior of $f$ qualitatively, too.
It turns out that $||n||$ is equal to twice the area of the triangle. (This is because the area of a triangle is half the size of the cross product of any two sides.)
Let $\theta_z$ be the angle between $n$ and the z-axis. Then $\langle n, e_3\rangle = ||n||\, ||e_3|| \cos{(\theta_z)}$ so our expression becomes $$f(x_1,x_2,x_3) = \frac{||n||}{2}\left(\cos{(\theta_z)}-1\right)^2$$
So if $A$ is the area of the triangle, and $\theta_z$ is the angle between the normal and the z-axis, we have $$f(x_1,x_2,x_3) = A\cdot(\cos(\theta_z)-1)^2.$$
So, there are two ways to shrink $f$ : by shrinking the area, or by rotating the triangle so that it is perpendicular to the z-axis (making $\cos(\theta_z) = 1)$. Depending on the size of the triangle, gradient ascent will result in some combination of these two changes.
The minimum value of $f$ occurs when the area of the triangle shrinks to nothing and/or when the triangle is perpendicular to the z-axis. In this case, $f$ is zero.
I believe Netwon's method is rotating the triangle to make it more perpendicular to the z-axis. It may also be warping the triangle, i.e. moving its vertices to shrink the area of the triangle. I don't know which strategy the gradient favors—maybe it depends on the particular triangle, e.g. its initial orientation? As a complete guess, I wonder if it does something like translate the vertices in the z-direction toward the geometric center of the triangle.

**Bumbble Comm** · Accepted Answer

$\newcommand{\v}{\mathbf}$ According to my math (see end of post), I get a slightly different answer for the gradient, which may explain the unexpected behavior:

$$(\nabla f)(\v{a},\v{b},\v{c}) = 2A {[\cos{(\theta_z)}-1]} \cdot \begin{bmatrix}\v{R}\bullet (\v{b}-\v{c}) \,+\, (\v{b}-\v{c})\times \widehat{n}\\\v{R}\bullet (\v{c}-\v{a}) \,+\, (\v{c}-\v{a})\times\widehat{n}\\\v{R}\bullet (\v{a}-\v{b}) \,+\, (\v{a}-\v{b})\times \widehat{n}\end{bmatrix}$$

Here, $A$ is the area of the triangle $(A = ||n||/2)$, and $\cos{\theta_z}$ is angle between the triangle normal and the $z$-axis $(\cos{\theta_z} = \langle \widehat{n}, \v{e}_3\rangle$). Importantly, this shows geometrically that the magnitude of change depends on the area of the triangle and its orientation: the smaller the area, or the more the triangle lies in the x-y plane, the less change there is. In particular, the area will quit shrinking if the triangle is parallel to the x-y plane.

The quantity $A\cdot[\cos{\theta_z}-1]$ is equal to the difference in area between the triangle and the shadow it casts on the x-y plane.
$\v{R} = \begin{bmatrix} \v{e}_2 \\ -\v{e}_1 \\ 0\end{bmatrix}$ is a kind of rotation matrix. It projects vectors onto the x-y plane, then rotates them clockwise by 90 degrees about the z-axis.
The gradient $\nabla f$ consists of terms like $\frac{\partial \v{n}}{\partial \v{a}} = \v{R}\bullet (\v{b}-\v{c}) + (\v{b}-\v{c})\bullet \widehat{n}$ that indicate the vector direction in which the triangle vertices like $\v{a}$ will move to optimize $f$. The component $(\v{b}-\v{c})\bullet \widehat{n}$ is easiest to understand: it's a vector pulling $\v{a}$ toward the center of the triangle with a magnitude that depends on the length of the opposite side. This is the component that causes the area of the triangle to shrink. The component $\v{R}\bullet(\v{b}-\v{c})$ is the source of rotation.
So I would suspect that gradient ascent and Newton's method actually yield similar results.
- The reason the triangle is not just shrinking is because the gradient has both a rotational and a contracting component. And the more it orients with the $z$-axis, the lower the total gradient and the less the triangle will change at all.
- The triangle will warp (i.e. the vertices move independently) if the gradient components for $\v{a}$, $\v{b}$, and $\v{c}$ are very different. In fact, you can see that the gradient for each vertex is directly proportional to the length of the opposite side. Therefore, non-equilateral triangles will tend to become more distorted.

Notes: My gradient calculation:

First, \begin{align*}\nabla f &= (\langle \v{n}, \v{e}_3\rangle - ||\v{n}||) \cdot \frac{\partial }{\partial \v{n}}(\langle \v{n}, \v{e}_3\rangle - ||\v{n}||) \cdot \left[\frac{\partial \v{n}}{\partial \v{a}}, \frac{\partial \v{n}}{\partial\v{b}}, \frac{\partial \v{n}}{\partial \v{c}}\right]\\ &= ||\v{n}||(\langle \widehat{\v{n}}, \v{e}_3\rangle - 1) \cdot \left(\v{e}_3 - \widehat{\v{n}}\right) \bullet \left[\frac{\partial \v{n}}{\partial \v{a}}, \frac{\partial \v{n}}{\partial\v{b}}, \frac{\partial \v{n}}{\partial \v{c}}\right]\\ &= 2A(\cos{(\theta_z)}- 1) \cdot \left(\v{e}_3 - \widehat{\v{n}}\right) \bullet \left[\frac{\partial \v{n}}{\partial \v{a}}, \frac{\partial \v{n}}{\partial\v{b}}, \frac{\partial \v{n}}{\partial \v{c}}\right]\\ \end{align*}
For a single component of $\nabla \v{n}$:

$$\frac{\partial \v{n}}{\partial \v{a}} = \frac{\partial }{\partial \v{a}} \left[(\v{b}-\v{a})\times(\v{c}-\v{a})\right] = \v{I} \times (\v{b}-\v{c}) = \begin{bmatrix}\v{e}_1 \times (\v{b}-\v{c})\\\v{e}_2 \times (\v{b}-\v{c})\\\v{e}_3 \times (\v{b}-\v{c})\end{bmatrix}$$

(and similarly for the other two triangle vertices; just replace $\v{a}\rightarrow \v{b} \rightarrow \v{c} \rightarrow \v{a}$)
Putting it together, $$(\v{e_3}-\widehat{\v{n}}) \bullet \left[\v{I} \times (\v{b}-\v{c}) \right] \\= \v{e_3}\bullet \left[\v{I} \times (\v{b}-\v{c}) \right] \;- \;\widehat{\v{n}} \bullet \left[\v{I} \times (\v{b}-\v{c}) \right] \\= (\v{e_3}\times \v{I})\bullet (\v{b}-\v{c}) \;+\; \v{I}\cdot (\v{b}-\v{c})\times\widehat{\v{n}}$$

Understanding Newton method when optimizing cost function depending on a triangle.

There are 3 best solutions below

Related Questions in NUMERICAL-METHODS

Related Questions in NUMERICAL-OPTIMIZATION

Trending Questions

Popular # Hahtags

Popular Questions