Could someone please provide and explain the derivation of Newton-Raphson method in higher dimensions?
The derivation of this method from the definition of the derivative is intuitive but I don't understand the derivation from it to higher dimensions.
You are given a function $f$ of type ${\mathbb R}^n\to{\mathbb R}^n$, defined on some open set $\Omega$, and you want to solve the equation $f(x)=0$. You suspect that the point $p\in\Omega$ is already quite near to a solution $\xi$ of this equation. You then argue as follows: For small increments $X$ attached at $p$ one has $$f(p+X)=f(p)+df(p).X+ o(X)\qquad (X\to0)\ .\tag{1}$$ The aim is to choose $X$ such that $f(p+X)=0$. Neglecting the error term in $(1)$ this amounts to solving the linear equation (resp., system of $n$ equations in $n$ unknowns) $$df(p).X=-f(p)\ .$$ Technical assumptions aside one obtains the solution $$X=-\bigl(df(p)\bigr)^{-1}.f(p)\ ,$$ so that one is led to proposing $$q:= p-\bigl(df(p)\bigr)^{-1}.f(p)\tag{2}$$ as a better approximation to $\xi$ than $p$ was. This idea leads to the following iterative algorithm: $$x_0:=p,\qquad x_{n+1}:= x_n-\bigl(df(x_n)\bigr)^{-1}.f(x_n)\qquad(n\geq1)\ .$$ Of course it is a grand project (i) to set up exact assumptions that guarantee $\lim_{n\to\infty}x_n=\xi$, and (ii) to analyze the speed of convergence in case it actually takes place.