Difference: Newton's method, Newton-Raphson method, Gauss Newton-method.

8.3k Views Asked by At

I would appreciate some clarification w.r.t. algorithms for solving nonlinear systems of equations.

1 - I don't understand the difference between Newton's method and Newton-Raphson method. In [1], Newton's method is defined using the hessian, but Newton-Raphson does not. However but I'm afraid they are actually the same thing, since I implemented both and the results were the same across different iterations.

2 - From my understanding, the Gauss Newton-method is a particular case of Newton's method to minimize least squared residuals. Is this correct?

Thanks in advance!

Peter

[1] - www2.imm.dtu.dk/pubdb/views/edoc_download.php/3215/pdf

2

There are 2 best solutions below

0
On BEST ANSWER

Newton and Newton-Raphson are just different names for the same method. Sometimes Newton-Raphson is prefered for the scalar/univariate case.

Standard Newton for a vector valued function $F$ (no. equations = no. variables) determines the update step $s=x_+-x$ by solving $F'(x)s=-F(x)$.

Gauß-Newton applies to over-determined systems (no. equations > no. variables) and minimizes the error in $\|F'(x)s+F(x)\|_2$ for the update step. For a quadratic system this reduces to standard Newton.

One could also solve an overdetermined system by minimizing the quadratic error $\|F(x)\|_2$. To determine the zeros of the gradient by Newton's method would here require second derivatives of $F$ which are not needed for Gauß-Newton.

0
On

Lutz Lehmann answer ignores an important aspect of Gauss-Newton:

Gauss Newton is an approximated version of Newton method, applied to non-linear least squares.

In regular Newton you have:

$$ x_{t+1} = x_t - H^{-1}g $$

Where $g=\nabla_x f(x)$ is the gradient of the function, and $H=\nabla_x^2 f(x)$ is the Hessian matrix of the function. For the sake of non-linear least squares, the function $f=\sum r_i^2=Loss$ where $r_i=y_i-f(x_i;\beta)$ are the residuals (Important: note the notation change here - $x$ our variable of interest will be the $\beta$'s, or coefficient of the model, and $x_i$ are the predictors/covariates).

It can be shown that the gradient for this function is equal to

$$\nabla _\beta Loss = -2J^Tr $$

Where $J$ is the $(n,p)$ Jacobian matrix ($n$=# of observations, $p$=# of variables / coefficients), and $r$ is the vector of residuals.

Gauss Newton then approximates the Hessian by setting it:

$$ H \approx 2J^T J $$

The final update rule thus becomes:

$$ \beta_{t+1} = \beta_t + (J^TJ)^{-1}J^Tr $$

The approximation can also be thought of as approximating the actual function with a 1st order Taylor expansion, and calculating the exact Hessian of that approximated function.

I have a YouTube video explaining more, which you can find here.