Newton Method for Scalar Field

231 Views Asked by At

I understand how Newton Method comes naturally from linear Taylor approximation of the function.

For $f: \mathbb{R}\to \mathbb{R}$ we have $$ x_{new}=x+\Delta x \\ f(x+\Delta x)=f(x)+f'(x)\Delta x=0 \\ f'(x)\Delta x=-f(x) \\ \Delta x=-\frac{f(x)}{f'(x)} \\ x_{new}=x-\frac{f(x)}{f'(x)} $$

For vector field $\textbf{F}: \mathbb{R^{n}}\to \mathbb{R^{n}} $ we have $$ \textbf{x}_{new}=\textbf{x}+\Delta \textbf{x} \\ \textbf{F}(\textbf{x}+\Delta \textbf{x})=\textbf{F}(\textbf{x})+\textbf{J}(\textbf{x})\Delta\textbf{x}=0 \\ \textbf{J}(\textbf{x})\Delta\textbf{x}=-\textbf{F}(\textbf{x}) \\ \Delta\textbf{x}=-\textbf{J}(\textbf{x})^{-1}\textbf{F}(\textbf{x}) \\ \textbf{x}_{new}=\textbf{x}-\textbf{J}(\textbf{x})^{-1}\textbf{F}(\textbf{x}) $$

I'm working with scalar field, but I can't figure out Newton Method for it.

For scalar field $f: \mathbb{R^{n}}\to \mathbb{R}$ $$ \textbf{x}_{new}=\textbf{x}+\Delta \textbf{x} \\ f(\textbf{x}+\Delta \textbf{x})=f(\textbf{x})+\nabla f(\textbf{x})\cdot \Delta\textbf{x}=0 \\ \nabla f(\textbf{x})\cdot \Delta\textbf{x}=-f(\textbf{x}) \\ ? $$

Dot product or non-square matrices doesn't have reciprocal inverse. Am I right that we don't have Newton Method for scalar fields because the derivation of it through Taylor approximation leads to this underdetermined equation $\nabla f(\textbf{x})\cdot \Delta\textbf{x}=-f(\textbf{x})$, which have infinitely many solutions? Can we still somehow use Newton Method or something alike for this kind of problems?

2

There are 2 best solutions below

2
On BEST ANSWER

As you correctly identified, when you have a scalar field the system $$ f(x) + \nabla f(x) \cdot \Delta x = 0 $$ has an infinite number of solutions. One approach is to choose the "nearest" solution, in the sense that $$ x_{\text{new}} = \arg\min_{y} \left\{\|y - x\|^2 \mid f(x) + \nabla f(x) \cdot (y - x) = 0 \right\}. $$ The program above yields a closed-form update, given by $$ x_{\text{new}} = x - \frac{f(x)}{\|\nabla f(x)\|^2} \cdot \nabla f(x). \qquad (\dagger) $$ The update $(\dagger)$ is sometimes known in the literature as the Polyak step size.

0
On

Let us consider the more general problem of solving $$g(x) = 0$$ where $g : \mathbb{R}^n \rightarrow \mathbb{R}^m$ and there are more unknowns than equations, i.e., $m \leq n$. The Jacobian $Dg(x) \in \mathbb{R}^{m \times n}$ of $g$ at the point $x \in \mathbb{R}^n$ is the $m$ by $n$ matrix of first order partial derivatives. We shall write $\text{Ker} A$ and $\text{Ran}(A)$ to denote the kernel (aka null space) and the range (aka column space) of the matrix $A$.

Now let us derive Newton's method from scratch. If $g$ is differentiable at $x \in \mathbb{R}^n$, then by definition, we have $$ g(x + \Delta x) = g(x) + Dg(x) \Delta x + o(\|\Delta x \|), \quad \Delta x \rightarrow 0, \quad x \not = 0. $$ We therefore seek to solve the central linear equation $$ 0 = g(x) + Dg(x) y $$ with respect to $y \in \mathbb{R}^n$. If $n > m$, then there is potentially infinitely many solution of this equation. In particular, if $y \in \mathbb{R}^n$ solves the central equation and $z \in \text{Ker}~Dg(x)$, then $y + z$ is also a solution, because $$ 0 = g(x) + Dg(x) y = g(x) + Dg(x)(y + z). $$ Conversely, if $y_1$ and $y_2$ both solve the central equation , then $y_1 - y_2 \in \text{Ker}~Dg(x)$. Now, if $y \in \mathbb{R}^n$ is any vector, then there exist exactly one vector $$ u \in \text{Ker}~Dg(x) $$ and exactly one vector $$ v \in \left( \text{Ker}~Dg(x) \right) ^\perp = \text{Ran}~Dg(x)^T $$ such that $$ y = u + v. $$ For this reason, we limit our search for a solution of the central equation to vectors $y \in \text{Ran}~Dg(x)^T$. Now let $y \in \text{Ran}~Dg(x)^T$, then there exists $\lambda \in \mathbb{R}^m$ such that $$ y = Dg(x)^T \lambda. $$ We see that $y$ solves the central equation if and only if $$ 0 = g(x) + Dg(x) Dg(x)^T \lambda. $$ This is the reason why Newton's method takes the form $$ x_{k+1} = x_k - Dg(x_k)^T s_k $$ where $s_k \in \mathbb{R}^m$ solves the linear system $$ Dg(x_k) Dg(x_k)^T s_k = g(x_k). $$ The central matrix $$ A_k = Dg(x_k) Dg(x_k)^T \in \mathbb{R}^{m \times m} $$ is nonsingular, precisely when Jacobian $Dg(x_k)$ has full rank $m$.


In OP's case $m=1$ and the Jacobian has full rank when the gradient is not the zero vector.