Derivation of Newton-Raphson method in higher dimensions

1.2k Views Asked by At

Could someone please provide and explain the derivation of Newton-Raphson method in higher dimensions?

The derivation of this method from the definition of the derivative is intuitive but I don't understand the derivation from it to higher dimensions.

2

There are 2 best solutions below

2
On BEST ANSWER

You are given a function $f$ of type ${\mathbb R}^n\to{\mathbb R}^n$, defined on some open set $\Omega$, and you want to solve the equation $f(x)=0$. You suspect that the point $p\in\Omega$ is already quite near to a solution $\xi$ of this equation. You then argue as follows: For small increments $X$ attached at $p$ one has $$f(p+X)=f(p)+df(p).X+ o(X)\qquad (X\to0)\ .\tag{1}$$ The aim is to choose $X$ such that $f(p+X)=0$. Neglecting the error term in $(1)$ this amounts to solving the linear equation (resp., system of $n$ equations in $n$ unknowns) $$df(p).X=-f(p)\ .$$ Technical assumptions aside one obtains the solution $$X=-\bigl(df(p)\bigr)^{-1}.f(p)\ ,$$ so that one is led to proposing $$q:= p-\bigl(df(p)\bigr)^{-1}.f(p)\tag{2}$$ as a better approximation to $\xi$ than $p$ was. This idea leads to the following iterative algorithm: $$x_0:=p,\qquad x_{n+1}:= x_n-\bigl(df(x_n)\bigr)^{-1}.f(x_n)\qquad(n\geq1)\ .$$ Of course it is a grand project (i) to set up exact assumptions that guarantee $\lim_{n\to\infty}x_n=\xi$, and (ii) to analyze the speed of convergence in case it actually takes place.

1
On

Let us admit that you need to solve $$f(x,y)=0 \qquad g(x,y)=0$$ and you have a starting point $(x_0,y_0)$. As, just as for Newton method with a single variable, approximate the function by its first order Taylor expansion (or its tangent plane if you prefer).

So, ignoring higher order terms,

$$f(x,y)\approx f(x_0,y_0)+(x-x_0) \frac{\partial f(x,y)}{\partial x}+(y-y_0) \frac{\partial f(x,y)}{\partial y}$$ $$g(x,y)\approx g(x_0,y_0)+(x-x_0) \frac{\partial g(x,y)}{\partial x}+(y-y_0) \frac{\partial g(x,y)}{\partial y}$$ all partial derivatives being computed at $(x_0,y_0)$. So, locally, let us write $$f(x_0,y_0)+(x-x_0) \frac{\partial f(x,y)}{\partial x}+(y-y_0) \frac{\partial f(x,y)}{\partial y}=0$$ $$g(x_0,y_0)+(x-x_0) \frac{\partial g(x,y)}{\partial x}+(y-y_0) \frac{\partial g(x,y)}{\partial y}=0$$ So, you have two linear equations to solve (variables being $x-x_0$ and $y-y_0$); get $x$ and $y$ and repeat the process for the newt step using $x_0=x$ and $y_0=y$; do that as long as the convergence criteria is not satisfied.

I am sure that you see how this generalizes to $n$ equations and variables.