The Newton-Raphson method in Banach spaces

477 Views Asked by At

Note: the screenshot at the bottom is where my question comes from.

This question is quite different from other versions of conditions of convergence of Newton iteration. For example, Kantorovich theorem.

I am now analysing the Newton-Raphson iteration in general Banach spaces $E,F$. Let $x_0\in E$, and let $f:B_t(x_0)\to F$ be a differentiable function. ($B$ denotes an open ball with radius $t$.) $L(E,F)$ is the set of linear mapping from $E$ to $F$.

By definition, $f$ is differentiable at $x$ with derivative $Df_x\in L(E,F)$ (which is a linear functional from $E$ to $F$) if $\exists r(h),f(x+h)=f(x)+Df_x(h)+r(h)$, where $r(h)/\|h\|\to 0$ as $h\to 0$.

To make it simple, I assume that there exist $s>0$ such that

  • $\|f(x_0)\|\leq t/(2s)$
  • If $x,y\in B_t(x_0)$ then $\|Df_x-Df_y\|\leq 1/(2s)$
  • $\forall x\in B_t(x_0),\exists J_x\in L(F,E)$ such that $J_xDf_x=Df_xJ_x=I_E$ and $\|J_x\|\leq s$.

Now let's work on the iteration. Let's fix $x\in B_t(x_0)$. Set $x_n=x_{n-1}-J_x(f(x_{n-1}))$. In real analysis course, we often take $x=x_{n-1}$, but here I have to fix $x$ to be anything in $B_t(x_0)$. Just assume for a moment that $\forall x\in B_t(x_0)$. I will explain why later.

Firstly I have to show that $x_n$ converges. Now I can use the inequality $$ \|f(a)-f(b)-T(a-b)\|\leq \|a-b\|\sup_{c\in [a,b]} \|Df_c-T\|, $$ where $[a,b]$ is the line segment joining $a,b$, and $T\in L(E,F)$.

To use this inequality, we define $g(y)=J_x(f(y))$, so $x_n=x_{n-1}-g(x_{n-1})$, and $Dg_y=J_xDf_y$.(The reason why I cannnot set $x=x_{n-1}$ is that if I do it that way, then $g(y)=J_y(f(y))$, and I cannot find the derivative of $g$ in this case.) Since $x$ is fixed, we can assume there is NO $x$ dependence in $g$. Therefore, $$ \|x_{n+1}-x_{n}\|=\|f(x_{n})-f(x_{n-1})-(x_{n}-x_{n-1})\|\\ \leq \|x_{n}-x_{n-1}\|\sup_{c\in [x_n,x_{n-1}]} \|Dg_c-I\|\\=\|x_{n}-x_{n-1}\|\sup_{c\in [x_n,x_{n-1}]} \|J_xDf_c-J_xDf_x\|\\ \leq \|x_{n}-x_{n-1}\|\|J_x\|\|Df_c-Df_x\|\\ \leq \frac{1}{2} \|x_{n}-x_{n-1}\|. $$ Also, $$ \|x_1-x_0\|=\|J_x(f(x_0))\|\leq t/2 $$ The conclusion is $\|x_n-x_{n-1}\|\leq t/2^n$.

My question: is it really OK to let $x$ be anything fixed in $B_t(x_0)$? Does that really work? If it is wrong, how can I fix it?


To prove that $f(x_n)$ converges to zero, I feel that I should prove something like $\|f(x_n)\|\leq t/(2^{n+1}s)$(Suggested in a book of real analysis). I try to start by considering this: $$ \|f(x_n)\|\leq \|Df_x\|\|x_{n+1}-x_n\| $$ but it goes nowhere. From $\|J_x\|\leq s$ we cannot obtain an upper bound on $Df_x$.

So how can I prove $\|f(x_n)\|\leq t/(2^{n+1}s)$?


It should be clear that $x_n$ is a Cauchy sequence - but it might not converge into $B_t(x_0)$ - is that a problem?

It is a long question, so if I have made mistakes please point it out.

Please look at the following screenshot if the above is not clear.


Source of my problem: A course in mathematical analysis (screenshot) enter image description here

Here is a theorem of Kantorovich which is related but not the same.

1

There are 1 best solutions below

1
On
**My question: is it really OK to let $x$ be anything fixed in $B_t(x_0)$? Does that really work? If it is wrong, how can I fix it?**

Yes, that is fine. Actually, you can re-phrase the definition of the sequence $(x_n)$ as a fixed-point iteration of the map $h := I - Df_x^{-1}\circ f : X \rightarrow X$ (i.e. $h(z):=z-Df_x^{-1}(f(z))$). Then these conditions are sufficient to prove that $h$ is a contraction:

$$ ||h(y)-h(z)|| \le ||y-z|| \sup_{c\in[y,z]} ||Dh_c|| $$

since

$$ ||Dh_c|| = || I - Df_x^{-1}\circ Df_c|| = || Df_x^{-1} ( Df_x - Df_c) || \le || Df_x^{-1} || \cdot || Df_x - Df_c || \le \frac{s}{2s} = \frac{1}{2}. $$

This shows that for each $x\in B_t(x_0)$ gives rise to a contraction $h$ so that you might apply Banach's fixed point theorem. And a fixed point $z$ corresponds to solution of $f(z)=0$.

**So how can I prove $\|f(x_n)\|\leq t/(2^{n+1}s)$?**

As you mentioned we have $x_{n+1}=x_n - J_x(f(x_n))$, hence $$ \| f(x_n) \| \le \| Df_x \| \|x_{n+1} - x_n\| $$

where $\|Df_x\|$ is something constant. Hence if $(x_n)$ is a Cauchy sequence, so is $(f(x_n))$. Therefore we get $\|f(x_n)\| \le C / 2^n$ but the constant might not be the same as claimed.

**It should be clear that $x_n$ is a Cauchy sequence - but it might not converge into $B_t(x_0)$ - is that a problem?**

Yes, this is a problem. I don't know if these conditions somehow enforce the limit to be inside $B_t(x_0)$. It might be on the boundary. What you could do: Either you continuously extend $f$ to the boundary or you assume e.g. something like $\|f(x_0)\| < t/(2s)$. Then the limit would be inside the ball.

I don't know if it is possible to really prove what the author of this text book claims. Maybe he was a little bit sloppy. But at least qualitatively the claims are fine ($\|f(x_n)\|$ up to a constant and the limits being a zero of $f$ after continuation).