Understanding the proof of the Implicit Mapping Theorem

219 Views Asked by At

I am following Advanced Calculus of Several Variables by C.H. Edwards, Jr. I failed to build the logic of the theorem III-$3.4$ stated below,

Theorem $3.4$: Let the mapping $G: \mathscr{R}^{m+n} \rightarrow \mathscr{R}^{n}$ be $\mathscr{C}^{1}$ in a neighborhood of the point $(a,b)$ where $G(a,b)=0$. If the partial derivative matrix $D_{2} G(a, b)$ is nonsingular, then there exists a neighborhood $U$ of $a$ in $\mathscr{R}^{m}$, a neighborhood $W$ of $(a, b)$ in $\mathscr{R}^{m+n}$, and a $\mathscr{C}^{1}$ mapping $h: U \rightarrow \mathscr{R}^{n}$, such that $y=h(x)$ solves the equation $G(x, y)=0$ in $W$.

In particular, the implicity defined mapping $h$ is the limit of the sequence of successive approximations defined inductively by,

$$ \begin{aligned} &\qquad h_{0}(\mathbf{x})=\mathbf{b}, \quad h_{k+1}(\mathbf{x})=h_{k}(\mathbf{x})-D_{2} G(\mathbf{a}, \mathbf{b})^{-1} G\left(\mathbf{x}, h_{k}(\mathbf{x})\right) \end{aligned} $$

for $\mathbf{x} \in U$.

Theorem $3.3$: Suppose that the mapping $f:\mathscr{R}^n\rightarrow\mathscr{R}^n$ is $\mathscr{C}^1$ in a neighborhood $W$ of the point $a$, with the matrix $f'(a)\neq 0$ then $f$ is locally invertible - there exist neighborhoods $U\subset W$ of $a$ and $V$ of $b=f(a)$ and a one-to-one $\mathscr{C}^1$ mapping $g:V\rightarrow W$ such that $$g(f(x))=x\quad\text{for } x \in U,$$ $$f(g(y))=y\quad\text{for } y \in W$$In particular, the local inverse $g$ is the limit of the sequence $\{g_k\}_{k=0}^\infty$ of successive approximations defined inductively by $$g_0(y)=a,\quad g_{k+1}(y)=g_k(y)-f'(a)^{-1}[f(g_k(y))-y]$$

Question $1$:

What I understand, inverse function theorem use implicit function theorem to guarantee there exist a relationship (function) of $y$ in term of $x$ (not explicitly). But the iterative form doesn't make sense to me. Like "Why applying inverse Jocobian $(f'(a)^{-1})$ on $[f(g_k(y))-y]$ we get better and better approximation of $g(y)?$". Because What I know is, "Jacobian approximate $f$ locally by a linear transformation". Then

what information encoded in the $f'(a)^{-1}$ for theorem $3.3$ and $D_{2} G(\mathbf{a}, \mathbf{b})^{-1}$ for theorem $3.4$?

Question $2$:

What's the main difference/motivation/intuition between these two theorems?


Maybe I am asking too many question for a single thread but as they are related to each other and pointed to understand only a single theorem, that's why I am put them all together. It will be great help if anyone explain those question.

1

There are 1 best solutions below

0
On BEST ANSWER

Regarding question $2$,

you can think $\underbrace{y=f(x)}_{G(x,y)=f(x)-y=0}$ of theorem $3.3$ and apply theorem $3.4$ to get the inverse mapping $g$ such that $x=g(y)$ where $D_1G(a,b)$ is definitely $f'(a)$. Eventually, the successive approximations are nothing but the same as $g_{k+1}(y)=g_k(y)-D_1G(a,b)^{-1}G(g_k(k),y)=g_k(y)-f'(a)^{-1}[f(g_k(k))-y]$. I am not explicitly mentioning the neighborhoods, as you seemed to understand that.

And I am pretty sure that the book used theorem $3.3$ to prove theorem $3.4$ which complete the equivalency of those two theorems.

Now, come to the question $1$. I can't imagine a better answer than @ peek-a-boo wrote here. Let me quote the main idea of that according to your question context,

Near the point $(a,b)$, where $G(a,b) = 0$, we can use the power of differential calculus to say \begin{align}0 &= G(x,y) \\ &\approx \underbrace{G(a,b)}_{0}+D_1G(a,b) \cdot (x-a) + D_2G(a,b) \cdot (y-b) \quad \text{if $(x,y)$ is near $(a,b)$} \tag{$*$} \end{align} The approximation being better the closer $(x,y)$ is to $(a,b)$. So, if in this general case we impose the condition that $D_2G(a,b)$ is invertible (i.e its determinant is nonzero), then, we get \begin{align} y \approx - (D_2G(a,b))^{-1} \cdot D_1G(a,b) \cdot (x-a) + b \end{align}

If you understand the equivalency, then I guess you won't face any issue to interpret $f'(a)^{-1}$ also.

Now, you might be asked why everyone uses linear version to answer that question. Why not use local quadratic approximation, $$ G(x, y) \approx G\left(x_{0}, y_{0}\right)+DG\left(a,b\right) \cdot\left[\begin{array}{l} x-a \\ y-b \end{array}\right]+\frac{1}{2}\left[\begin{array}{ll} x-a \quad y-b \end{array}\right] H_{G}\left(a, b\right)\left[\begin{array}{l} x-a \\ y-b \end{array}\right] $$ Where $H_G$ is the Hessian matrix of $G$. We actually don't want to get the $h$ or $g$ function in a single shot. We leverage the computation to the successive approximation scheme. There is no rule to use linearization. The more higher version used to get those function, the more computation you needed for single shot which is not computational efficient.