Why is this a safe neighborhood for uniform convergence of the sequence implicitly defining a function?

386 Views Asked by At

This discussion pertains to Theorem III-1.4 in C.H. Edwards, Jr.'s Advanced Calculus of Several Variables.

I am trying to understand why the neighborhood described in the proof of the implicit function theorem is "safe" for convergence of the contraction mapping with the inductive hypothesis for the sequence :

$f_{n+1}[x]=f_{n}[x]-\frac{G[x,f_{n}[x]]}{D_{2}G[a,b]}$, $f_{0}[x]=b$.

The neighborhood is given as follows:

$\varepsilon\ge\lvert x-a\rvert\bigwedge\varepsilon\ge\lvert y-b\rvert\implies\lvert D_{2}G[x,y]-D_{2}G[a,b]\rvert\le\frac{1}{2}\lvert D_{2}G[a,b]\rvert$

$\varepsilon>\delta\ge\lvert x-a\rvert\implies\lvert G[x,b]\rvert\le\frac{1}{2}\varepsilon\lvert D_{2}G[a,b]\rvert$

A safe neighborhood for convergence. The graph depicts the trace on the surface $\{x,y,G[x,y]\}$ of the square $[a-\varepsilon,a+\varepsilon]\times[b-\varepsilon,b+\varepsilon]$ in green.

The further $\delta$ restriction is indicated in magenta, as is the trace of the line $y=b$.

The trace of $x=a$ is blue.

The white angle represents the projection of the point $\{a,b+\frac{\varepsilon}{2},\frac{\varepsilon}{2}D_{2}G[a,b]\}$.

$\frac{\varepsilon}{2}D_{2}G[a,b]$ also gives the height of the upper plane.

The actual solution curve $\{x,f[x],0\}$ is the intersection of the lower plane and the surface $\{x,y,G[x,y]\}$.

The red parallelogram represents the condition $G[a,y]=2(y-b)D_{2}G[a,y]$ which leads to an infinite loop. The choice of $\varepsilon$ prevents that from happening on the blue curve. This is explained in my answer to a previous question: Neighborhoods necessary for convergence of a sequence.

What I am trying to determine is why the rectangle $[a-\delta,a+\delta]\times[b-\varepsilon,b+\varepsilon]$ is safe for convergence for all $x\in[a-\delta,a+\delta]$. In other words, how it avoids the infinite loop condition.

A second diagram shows a simpler depiction of the problem using a different $G[x,y]$.

Edit to add: I failed to mention a potentially significant assertion regarding the second image. The rectangle in that image was chosen so that the trace of the line segment $G[{c_1,d_1},{c_2,d_1}]$ is completely below the plane $z=0$, and $G[{c_1,d_2},{c_2,d_2}]$ is completely above the surface of $z=0$. $D_2[x,y]>0$ holds everywhere in the rectangle $[c_1,c_2]\times[d_1,d_2]$.

Simpler depiction of an implicitly determined function.

The objective is to show that $f_{n+1}[x_*]=f_{n}[x_*]-\frac{G[x_*,f_{n}[x_*]]}{D_{2}G[a,b]}$ converges to a point on the solution curve for all $x_*$ in some neighborhood within the rectangle $[c_1,c_2]\times[d_1,d_2]$. The neighborhood described above would lie within that rectangle.


Edit to add: Edit to modify: I observe that the portions of the green curve with constant $x$ never intersect the plane $z=0$ in the neighborhood, so it is right to exclude them. Though the magenta curves do intersect the solution curve, that is not a guarantee that they would do so with a different $G[x,y]$.
Edit to add: I also note that the hypotheses of the theorem only stipulate $G$ is $\mathscr{C}^1$. That makes talking about inflection points difficult. But they seem relevant.

It seems proper to examine what happens at the extreme conditions such as:

$D_2G[x_*,y]=\frac{1}{2}D_2G[a,b]$,

$D_2G[x_*,y]=\frac{3}{2}D_2G[a,b]$

and

$\lvert D_{2}G[x,y]-D_{2}G[a,b]\rvert=\frac{1}{2}\lvert D_{2}G[a,b]\rvert$

I've removed my own speculations about those conditions because I believe they were too restrictive (IOW, wrong).


Edit to add: Another observation that seems significant is that all sequences begin with the base state $f_0[x]=b$. I'm pretty sure that means all subsequent $f_n[x]$ will be on the same side of the plane $y=b$ as $f_1[x]$.

Unfortunately I don't have a good exact vocabulary, nor theory of this problem domain. IOW, I'm hand-waving it. All I'm seeking at this point is an intuitively satisfying geometric explanation.


Edit to add: A third graphic showing a "side view" of the situation, with the $x$ dimension suppressed. By collapsing the problem into the consideration of a family of curves in two dimensions which adhere to the stated conditions, the problem might reduce to one of real analysis.

The yellow lines depict the range of allowable slopes for any curve in the neighborhood, determined by $\varepsilon\ge\lvert x-a\rvert\bigwedge\varepsilon\ge\lvert y-b\rvert\implies\lvert D_{2}G[x,y]-D_{2}G[a,b]\rvert\le\frac{1}{2}\lvert D_{2}G[a,b]\rvert$

The blue curve and tangent line represent a safe state. The red parallelogram and chord depict a loop state and the slope of twice the loop tangent, respectively. The green vertical lines represent $y=\pm\varepsilon$. The vertical magenta line segment represents the restriction:

$\varepsilon>\delta\ge\lvert x-a\rvert\implies\lvert G[x,b]\rvert\le\frac{1}{2}\varepsilon\lvert D_{2}G[a,b]\rvert$.

That is, (I believe) every 2-dimensional projection of a curve must intersect the magenta line segment. Furthermore, the base term of every sequence is $f_0[x]=b$. The first iteration is therefore: $f_{1}[x]=b-\frac{G[x,b]}{D_{2}G[a,b]}$.

silhouette view

2

There are 2 best solutions below

0
On BEST ANSWER

The $\delta$ restriction in conjunction with the $\varepsilon$ restriction ensures that $\varphi[b]=b-\frac{G[x_{*},b]}{D_{2}G[a,b]}\in[b-\varepsilon,b+\varepsilon]$. So the hypotheses of the contraction mapping theorem are satisfied. This is because the maximal height $\lvert G[x,b]\rvert=\frac{1}{2}\varepsilon\lvert D_{2}G[a,b]\rvert$ divided by the minimal slope $\lvert D_{2}G[x,y]\rvert=\frac{1}{2}\lvert D_{2}G[a,b]\rvert$ is $\frac{\lvert G[x,b]\rvert}{\lvert D_{2}G[a,b]\rvert}=\frac{\frac{1}{2}\varepsilon\lvert D_{2}G[a,b]\rvert}{\frac{1}{2}\lvert D_{2}G[a,b]\rvert}=\varepsilon$.

This does not give me the intuitively satisfying geometrical picture I am seeking, but I believe it answers the question in the mathematical sense.


Edit to add better explanation of how this avoids the loop condition.

The loop condition discussed here in Neighborhoods necessary for convergence of a sequence. is:

$2xf^{\prime}[x]=f[x]$.

In order to avoid it, require

$f^{\prime}[x]>\frac{f[x]}{2x}$.

Translating that into terms of the current discussion:

$\lvert D_{2}G[x_{*},y]\rvert\ge\frac{1}{2}\lvert D_{2}G[a,b]\rvert$ corresponds with the minimal $f^{\prime}[x]$.

$\lvert G[x_{*},b]\rvert\le\frac{1}{2}\varepsilon\lvert D_{2}G[a,b]\rvert$ corresponds with the maximal $f[x]$.

Writing the previous inequality in these terms gives

$\lvert D_{2}G[x_{*},y]\rvert>\frac{\lvert G[x_{*},b]\rvert}{2\lvert y-b\rvert}$.

Replacing the the terms with their limiting values gives

$\frac{1}{2}\lvert D_{2}G[a,b]\rvert>\frac{\frac{1}{2}\varepsilon\lvert D_{2}G[a,b]\rvert}{2\varepsilon}$

$\lvert D_{2}G[a,b]\rvert>\frac{\lvert D_{2}G[a,b]\rvert}{2}$.

So the neighborhood seems safe from loops.

5
On

Look at page 169, on the sixth line, for the sentence that begins "First note that". A key question I have for you is: do you agree with the estimate, on page 169, that says that, for all $y$ in the interval $[b - \epsilon, b + \epsilon]$, we have$$|\phi'(y)| \le {1\over2}?$$If that estimate seems okay to you, then I think I can explain why the sequence$$f_1(x_*), \text{ } f_2(x_*), \text{ } f_3(x_*),\ldots$$converges, and, in particular, does not get into an infinite loop.

Would that answer your question?

I should also be clear about the "orders of quantification" in Theorem 1.4, which I could explain by turning the theorem into a game.

You give me a function $G$ and a point $(a, b)$ satisfying the conditions in the theorem. I then give you an interval $J$ and a function $f: J \to \mathbb{R}$. You then pick a point $x_*$ in $J$. We check whether$$G(x_*, f(x_*)) = 0.$$If yes, I win. If no, you win.

My strategy for winning is given in the proof. The start of this strategy is to choose $\epsilon$, then $\delta$. I then give you, as my move,$$J := [a - \delta, a + \delta]$$and use contraction methods to define $f$, and give that to you as well. In the proof, I want to convince myself that, no matter which $x_*$ in $J$ you choose, the contraction method will yield $f(x_*)$ in such a way that$$G(x_*, f(x_*)) = 0.$$I think you may be concerned that, if $x_*$ is chosen too far away from $a$, then the contraction method yields a sequence that does not converge, and possibly gets caught in an infinite loop. However, remember that, as part of my move in the game, I choose $\delta$, and then use it to define $J$.

You are therefore constrained on where you can choose $x_*$.

So, if you can come up with an example (maybe in pictures) where you see an $x_*$ that leads to an infinite loop, it may be that, as part of my strategy in choosing $J$, I do not allow you to use that $x_*$.

So the theorem says that for every $G$ and $(a, b)$ there exists $J$ and $f$ such that, for every $x_*$ in $J$,$$G(x_*, f(x_*)) = 0.$$It does not say that for every $G$ and $(a, b)$ and $x_*$ there exists $f$ such that$$G(x_*, f(x_*)) = 0.$$That is, in the game, you do not get to choose $x_*$ before I give you $J$.

The "for every" and "there exists" are called quantifiers. Order of quantification is something that often causes confusion.

I do understand where the contraction constant $k={1\over2}$ came from. It is a direct result of the choice of $\epsilon$. I also understand that $\delta$ further restricts the interval in which $x$ lives. What I don't understand is why the particular choice of $\delta$ protects me from a loop condition. I'm also dubious of the $\mathscr{C}^1$ hypothesis. It seems the loop condition involves what I consider to be $\mathscr{C}^2$ properties. To wit, inflection.

Do you agree that, for all integers $n > 0$, for all $y$ in $[b - \epsilon, b + \epsilon]$, we have $f_{n+1}(y) = \phi(f_n(y))$?

Step 1: Prove that $b$, $\phi(b)$, $\phi(\phi(b))$, $\phi(\phi(\phi(b)))$, $\ldots$ are all elements of $[b - \epsilon, b + \epsilon]$.

Step 2: Prove that$$f_0(x_*) = b, \text{ }f_1(x_*) = \phi(b), \text{ }f_2(x_*) = \phi(\phi(b)), \text{ }f_3(x_*) = \phi(\phi(\phi(b))), \ldots$$etc.

Step 3: Prove that the sequence$$b, \text{ }\phi(b), \text{ }\phi(\phi(b)), \text{ }\phi(\phi(\phi(b)))), \ldots$$is Cauchy, and so is convergent, and in particular, cannot be a repeating nonconstant sequence. That is, it cannot be an "infinite loop".

Do you agree that, if I can get these three steps proved, then we are done?

Let us handle Step 1.

Step 1: Prove that $b$, $\phi(b)$, $\phi(\phi(b))$, $\phi(\phi(\phi(b)))$, $\ldots$ are all elements of $[b - \epsilon, b + \epsilon]$.

If you look at the third line from the bottom of page 168, you see$$|G(x, b)| \le {1\over2}\epsilon |D_2G(a, b)|.$$This implies the following inequality:$$|\phi(b) - b| \le {1\over2}\epsilon.\tag*{$(1)$}$$Then, because $\phi$ is a ${1\over2}$-contraction, this implies the following inequality:$$|\phi(\phi(b)) - \phi(b)| \le {1\over4}\epsilon.\tag*{$(2)$}$$Then, because $\phi$ is a ${1\over2}$-contraction, this implies the following inequality:$$|\phi(\phi(\phi(b))) - \phi(\phi(b))| \le {1\over8}\epsilon.\tag*{$(3)$}$$Then, because $\phi$ is a ${1\over2}$-contraction, this implies the following inequality:$$|\phi(\phi(\phi(\phi(b)))) - \phi(\phi(\phi(b)))| \le {1\over{16}}\epsilon.\tag*{$(4)$}$$ Etc.

By $(1)$,$$|\phi(b) - b | \le {1\over2}\epsilon < \epsilon,$$ so $\phi(b)$ is in $[ b - \epsilon , b + \epsilon ]$.

By $(1)$, $(2)$, and the triangle inequality,$$| \phi(\phi(b)) - b | \le {1\over2}\epsilon + {1\over4}\epsilon < \epsilon,$$ so $\phi(\phi(b))$ is in $[ b - \epsilon , b + \epsilon ]$.

By $(1)$, $(2)$, $(3)$, and the triangle inequality, $$|\phi(\phi(\phi(b))) - b | \le {1\over2}\epsilon + {1\over4}\epsilon + {1\over8}\epsilon < \epsilon,$$ so $\phi(\phi(\phi(b)))$ is in $[ b - \epsilon , b + \epsilon ]$. Etc.

This finishes Step 1. (Let me know if you understand. If not, try to point out the first statement I made for which you cannot see that it follows from earlier statements.)

Could you look at Step 2, and see if you can figure it out? Just look carefully at the definitions of $f_{n+1}$ and $\phi$. If it does not make sense, let me know, and I will write something about it.

Now let us work on Step 3.

Step 3: Prove that the sequence$$b, \text{ }\phi(b), \text{ }\phi(\phi(b)), \text{ }\phi(\phi(\phi(b)))), \ldots$$is Cauchy, and so is convergent, and in particular, cannot be a repeating nonconstant sequence. That is, it cannot be an "infinite loop".

The sequence$$b,\text{ }\phi(b),\text{ }\phi(\phi(b)),\text{ }\phi(\phi(\phi(b))), \text{ }\phi(\phi(\phi(\phi(b)))),\ldots$$is sometimes called the "forward orbit" of $b$ under $\phi$.

A key observation is that each consecutive distance between pairs of points in any forward orbit is less than or equal to ${1\over2}$ times the preceding consecutive distance. So every forward orbit has geometrically decaying consecutive distances. Such a sequence is always Cauchy, hence convergent. Here is an explanation of why.

First, note that a sequence may not converge, even if consecutive distances go to zero. For example, in the sequence of real numbers$$1,\text{ } 1 + {1\over2}, 1 + {1\over2} + {1\over3}, 1 + {1\over2} + {1\over3}+ {1\over4}, \ldots$$ the consecutive distances are $${1\over2},\text{ }{1\over3},\text{ }{1\over4}, \ldots$$which tends to zero. However, the sequence does not converge to any real number, because it is the sequence of partial sums of the harmonic series, see the following.

https://en.wikipedia.org/wiki/Harmonic_series_(mathematics)

On the other hand if you have a sequence that has geometrically decaying consecutive distances, like $$1, \text{ }1 + {1\over2},\text{ }1 + {1\over2} + {1\over4},\text{ }1 + {1\over2} + {1\over4} + {1\over8}, \ldots$$ then it is always Cauchy, hence convergent.

For example, note that if $i<j$, then the distance between the $i$th and $j$th terms of$$1, \text{ }1 + {1\over2},\text{ }1 + {1\over2} + {1\over4},\text{ }1 + {1\over2} + {1\over4} + {1\over8}, \ldots$$ is less than or equal to$$\left({1\over2}\right)^i + \ldots + \left({1\over2}\right)^{j-1},$$which is less than or equal to $({1\over2})^{i-1}$. When $i$ is large, $({1\over2})^{i-1}$ can be made as small as you like.

Let me know if all this makes sense. If not, then please let me know the first statement I make that you cannot figure out and I will try to add detail. I am leaving a lot here for you to figure out, so I really do understand if you need more explanation here and there.