Clarification on justification for fixed-point iteration

48 Views Asked by At

I'm trying to understand why fixed-point iteration works. I came across this answer by user "JiK":

Let us assume that function $g$ is defined on an interval $(a,b)$, $g(x)\in(a,b)$ in that interval, and that there is a constant $c<1$ such that for each $x,y \in (a,b)$, \begin{equation} \left|g(y)-g(x)\right| < c |x-y|. \tag{1} \end{equation} If $g$ has a derivative, this becomes $g'(x)<c$ for $x\in(a,b)$.

The fixed point iteration is defined by $x_{k+1}=g(x_k)$, where $x_0$ is an arbitrarily chosen starting point in $(a,b)$.

Let us assume that the function has a fixed point at $\hat{x}\in(a,b)$, that is $\hat{x}=g(\hat{x})$.

Now at step $k$, the absolute error of our current guess to the fixed point is $e_k = |x_k-\hat{x}|$. We get $$ e_{k+1} = |x_{k+1}-\hat{x}| = |g(x_k)-g(\hat{x})| < c|x_k - \hat{x}| = c e_k. $$

Therefore, the sequence $(e_k)_k$ is nonnegative and bounded above by the sequence $(c^ke_0)_k$, which converges to $0$. Therefore, $\lim_{k\to\infty}e_k=0$. This means that the fixed point iteration converges to $\hat{x}$.


For general $g:\mathbb{R}\to\mathbb{R}$, we can make following observations:

If (1) holds in $\mathbb{R}$, we can replace $(a,b)$ with $\mathbb{R}$ in the above proof. One can also see that the function has exactly one fixed point in that case (if $g$ is differentiable, the derivative of $g(x)-x$ is smaller than a negative constant, thus $g(x)-x$ has exactly one zero; if $g$ is not differentiable, a similar argument still holds).

If (1) does not hold in $\mathbb{R}$ but holds in an interval $(a,b)$ containing a fixed point, we can see that $g(a)>a$ and $g(b)<b$, so $g(x) \in (a,b)$ as required. Now the fixed point iteration converges to the fixed point if $x_0$ is chosen inside the interval.

I'm having difficulty understanding a couple of things here:

  1. This all depends on there existing a constant $c<1$ such that for each $x,y \in (a,b)$, \begin{equation} \left|g(y)-g(x)\right| < c |x-y|. \tag{1} \end{equation} But what is the justification for this (that is, how do we know that this is a valid assumption)? In response to user "copper.hat"'s comment, how do we check that this is valid for any particular situation? Is there a theorem that we can apply to a situation that allows us to conclude that this is true (perhaps some theorem about differentiability or something)?

  2. Where did the $c^k$ in $(c^ke_0)_k$ come from?

Can someone please help me understand this? Thank you.