I have the following function defined from $\mathbb{R}^9 \to \mathbb{R}$
\begin{equation} f(x) = f(x_1,x_2,x_3) = \frac{1}{2}(\left\langle n,e_3 \right\rangle - \lVert n \rVert)^2 \end{equation}
Basically it's a function of the vertices of a given triangle and we also have $e_3 = (0,0,1)^T$ and $n = (x_2 - x_1)\times (x_3 - x_1)$. Computing the gradient w.r.t. $x$ leads me to
$$ \nabla f = \left(\left\langle \frac{n}{\lVert n \rVert} ,e_3 \right\rangle - 1\right)^2 \begin{pmatrix} (x_2 - x_3) \times n \\ (x_3 - x_1) \times n \\ (x_1 - x_2) \times n \end{pmatrix} $$
I'm trying to give a rigorous interpretation of that gradient, if I set that to $0$ I can get the minumum (though there's a an issue with $\lVert n \rVert$ term. I've implemented such gradient in matlab to minimize a function and this is what I'm getting:
The algorithm matlab is using is the Newton method. Not sure if that's visible but basically it seems the triangle has been rotated in order to be parallel to the xy plane.
However if would use gradient descent instead of Newton method, If I'm reading correctly the gradient it seems to me the three vertices would potentially be shrank until the triangle would degenerate to a single point.
Why is the Newton method attempting to rotate instead of warping the triangle?

$\newcommand{\v}{\mathbf}$ According to my math (see end of post), I get a slightly different answer for the gradient, which may explain the unexpected behavior:
$$(\nabla f)(\v{a},\v{b},\v{c}) = 2A {[\cos{(\theta_z)}-1]} \cdot \begin{bmatrix}\v{R}\bullet (\v{b}-\v{c}) \,+\, (\v{b}-\v{c})\times \widehat{n}\\\v{R}\bullet (\v{c}-\v{a}) \,+\, (\v{c}-\v{a})\times\widehat{n}\\\v{R}\bullet (\v{a}-\v{b}) \,+\, (\v{a}-\v{b})\times \widehat{n}\end{bmatrix}$$
Here, $A$ is the area of the triangle $(A = ||n||/2)$, and $\cos{\theta_z}$ is angle between the triangle normal and the $z$-axis $(\cos{\theta_z} = \langle \widehat{n}, \v{e}_3\rangle$). Importantly, this shows geometrically that the magnitude of change depends on the area of the triangle and its orientation: the smaller the area, or the more the triangle lies in the x-y plane, the less change there is. In particular, the area will quit shrinking if the triangle is parallel to the x-y plane.
The quantity $A\cdot[\cos{\theta_z}-1]$ is equal to the difference in area between the triangle and the shadow it casts on the x-y plane.
$\v{R} = \begin{bmatrix} \v{e}_2 \\ -\v{e}_1 \\ 0\end{bmatrix}$ is a kind of rotation matrix. It projects vectors onto the x-y plane, then rotates them clockwise by 90 degrees about the z-axis.
The gradient $\nabla f$ consists of terms like $\frac{\partial \v{n}}{\partial \v{a}} = \v{R}\bullet (\v{b}-\v{c}) + (\v{b}-\v{c})\bullet \widehat{n}$ that indicate the vector direction in which the triangle vertices like $\v{a}$ will move to optimize $f$. The component $(\v{b}-\v{c})\bullet \widehat{n}$ is easiest to understand: it's a vector pulling $\v{a}$ toward the center of the triangle with a magnitude that depends on the length of the opposite side. This is the component that causes the area of the triangle to shrink. The component $\v{R}\bullet(\v{b}-\v{c})$ is the source of rotation.
So I would suspect that gradient ascent and Newton's method actually yield similar results.
Notes: My gradient calculation:
First, \begin{align*}\nabla f &= (\langle \v{n}, \v{e}_3\rangle - ||\v{n}||) \cdot \frac{\partial }{\partial \v{n}}(\langle \v{n}, \v{e}_3\rangle - ||\v{n}||) \cdot \left[\frac{\partial \v{n}}{\partial \v{a}}, \frac{\partial \v{n}}{\partial\v{b}}, \frac{\partial \v{n}}{\partial \v{c}}\right]\\ &= ||\v{n}||(\langle \widehat{\v{n}}, \v{e}_3\rangle - 1) \cdot \left(\v{e}_3 - \widehat{\v{n}}\right) \bullet \left[\frac{\partial \v{n}}{\partial \v{a}}, \frac{\partial \v{n}}{\partial\v{b}}, \frac{\partial \v{n}}{\partial \v{c}}\right]\\ &= 2A(\cos{(\theta_z)}- 1) \cdot \left(\v{e}_3 - \widehat{\v{n}}\right) \bullet \left[\frac{\partial \v{n}}{\partial \v{a}}, \frac{\partial \v{n}}{\partial\v{b}}, \frac{\partial \v{n}}{\partial \v{c}}\right]\\ \end{align*}
For a single component of $\nabla \v{n}$:
$$\frac{\partial \v{n}}{\partial \v{a}} = \frac{\partial }{\partial \v{a}} \left[(\v{b}-\v{a})\times(\v{c}-\v{a})\right] = \v{I} \times (\v{b}-\v{c}) = \begin{bmatrix}\v{e}_1 \times (\v{b}-\v{c})\\\v{e}_2 \times (\v{b}-\v{c})\\\v{e}_3 \times (\v{b}-\v{c})\end{bmatrix}$$
(and similarly for the other two triangle vertices; just replace $\v{a}\rightarrow \v{b} \rightarrow \v{c} \rightarrow \v{a}$)
Putting it together, $$(\v{e_3}-\widehat{\v{n}}) \bullet \left[\v{I} \times (\v{b}-\v{c}) \right] \\= \v{e_3}\bullet \left[\v{I} \times (\v{b}-\v{c}) \right] \;- \;\widehat{\v{n}} \bullet \left[\v{I} \times (\v{b}-\v{c}) \right] \\= (\v{e_3}\times \v{I})\bullet (\v{b}-\v{c}) \;+\; \v{I}\cdot (\v{b}-\v{c})\times\widehat{\v{n}}$$