Question regarding the requirement (determinant) for implicit function theorem

410 Views Asked by At

enter image description here

The picture gives part of the definition for Implicit Function Theorem, I know some definition for determinants where there is linear independence between each equation in $\mathbb{R^k}$. However, other than that, I cannot seem to connect why det$D_y$F(a,b) cannot equal to 0 in order for $F(x,y)=0$

1

There are 1 best solutions below

8
On BEST ANSWER

What I'll provide is a motivation for why we might impose/how we might come up with the condition $\det (D_yF(a,b)) \neq 0$. For the full explanation of where this fact is used, of course just refer to the proof in your book.

I hope you know that differential calculus (roughly speaking) is the theory of locally approximating by linear functions, because linear things are nice to work with. So the key idea behind things like implicit function theorem/inverse function theorem or really any "big" theorem in differential calculus is to say to yourself

"Right now I have a very general and difficult problem. Can I solve this problem in the special case where everything is nice and linear? Can I then use the insight I gained from the linear case to solve the general case?"

So, in the spirit of this guiding principle, we consider a very special case: let $A \in M_{k \times n}(\Bbb{R})$, and let $B \in M_{k \times k}(\Bbb{R})$, and define the function $G: \Bbb{R}^n \times \Bbb{R}^k \to \Bbb{R}^k$ by \begin{equation} G(x,y) = Ax + By \end{equation} Now the question at hand is: If $G(x,y) = 0$, then can I solve $y$ in terms of $x$? The answer is pretty simple in this case, because if the matrix $B$ is invertible (i.e $\det B \neq 0$) then \begin{equation} G(x,y) = 0 \end{equation} implies that \begin{align} Ax + By = 0, \end{align} and hence \begin{align} y =- (B^{-1}A) x. \end{align}

So, to solve the problem in this special case, we had to make the assumption that $B$ is invertible (i.e $\det B \neq 0$). This is the key insight we gained by solving the special linear case!

This is useful because in general the function $F$ you have been given in the theorem might be very complicated, so you don't know what it really looks like. However, near a point $(a,b)$, where $F(a,b) = 0$, we can use the power of differential calculus to say \begin{align} F(x,y) \approx D_xF(a,b) \cdot (x-a) + D_yF(a,b) \cdot (y-b) \quad \text{if $(x,y)$ is near $(a,b)$} \tag{$*$} \end{align} (the approximation being better the closer $(x,y)$ is to $(a,b)$)

Now, the actual question you're being asked is: if $F(x,y) = 0$, then can we solve for $y$ in terms of $x$ (atleast for $(x,y)$ close to $(a,b)$)? This is a difficult problem, but we can use the linear approximation ($*$) to get a rough idea: we have that \begin{align} 0 &= F(x,y) \\ & \approx D_xF(a,b) \cdot (x-a) + D_yF(a,b) \cdot (y-b) \end{align} Notice how this is almost like the situation we had above with the function $G$. Here, $A = D_xF(a,b)$ and $B = D_yF(a,b)$. So, if in this general case we impose the condition that $D_yF(a,b)$ is invertible (i.e its determinant is nonzero), then, we get \begin{align} y \approx - (D_yF(a,b))^{-1} \cdot D_xF(a,b) \cdot (x-a) + b \end{align}

Thus, we have used our knowledge of the exact solution in the special linearized case to get a "rough approximate solution" in the general case. Now, all that remains to rigorously prove the theorem is to do some detailed and technical analysis of all the error terms wherever I said $\approx$ above, and to show that even in the general case, we really can solve for $y$ in terms of $x$, provided that $D_yF(a,b)$ is invertible (your book should cover all the detailed arguments).

This is the motivation for why we put $\det D_yF(a,b) \neq 0$ as part of our hypothesis, and it also outlines the thought process of how one might come up with such a requirement. Of course, after coming up with such a requirement, one can come up with examples to show that if this condition is not satisfied, then we cannot solve for $y$ in terms of $x$.


Indeed a simple example to show that the assumption $\det D_yF(a,b) \neq 0$ is needed for the theorem to be true is the following:

let $k=n=1$, define $F: \Bbb{R} \times \Bbb{R} \to \Bbb{R}$ by $F(x,y) = x^2 + y^2 - 1$. Choose $(a,b) = (1,0)$. Then, clearly $F(1,0) = 0$ and $D_yF(1,0) = 0$ (this is a $1 \times 1$ matrix). So, the determinant is also $0$.

Now, notice that the set of $(x,y)$ which satisfy $F(x,y) = 0$ are points on the unit circle in the plane. It should be clear pictorially, that near $(1,0)$, it is impossible to solve for $y$ as a function of $x$.

It was not possible in this case because the determinant was $0$. This shows why the determinant condition is required. (However, notice that $D_xF(1,0) = 2 \neq 0$, so we can solve for $x$ as a function of $y$)


Edit in response to comments:

Recall that in general, by definition, for any function $F: \Bbb{R}^p \to \Bbb{R}^m$, we say $F$ is differentiable at $\alpha$, if there is an $n \times p$ matrix $T$ such that \begin{equation} F(\xi) - F(\alpha) = T(\xi - \alpha) + o(\lVert\xi - \alpha \rVert) \end{equation} If $F$ is differentiable at $\alpha$, then $T$ is unique, and we denote it by the symbol $DF(\alpha)$. i.e we can approximate the change $F(\xi)-F(\alpha)$ by a linear part $DF(\alpha) \cdot (\xi -\alpha)$, and the approximation is valid up to an accuracy of little-oh.

In your particular case, write $p = n+k$, $\xi = \begin{bmatrix} x \\y \end{bmatrix} $, and write $\alpha = (a,b)$. Note that we have the following block matrix decomposition: \begin{align} DF(a,b) = \begin{bmatrix} D_xF(a,b) & D_yF(a,b) \end{bmatrix} \end{align} Hence, we get \begin{align} F(x,y) &= F(a,b) + DF(a,b) \cdot \begin{bmatrix} x-a \\ y-b \end{bmatrix} + o(\lVert (x,y) - (a,b)\rVert) \\ &= F(a,b) + \begin{bmatrix} D_xF(a,b) & D_yF(a,b) \end{bmatrix} \cdot \begin{bmatrix} x-a \\ y-b \end{bmatrix} + o(\lVert (x,y) - (a,b)\rVert) \\ &= F(a,b) + D_xF(a,b) \cdot (x-a) + D_yF(a,b) \cdot (y-b) + o(\lVert (x,y) - (a,b)\rVert) \end{align}

This is the proper statement in general, and everything is an equal sign (there are no approximations, because we already took the error term into account with the little-oh notation). In the case of the implicit function theorem, we have $F(a,b) = 0$ by assumption. Hence, we get the statement \begin{equation} F(x,y) = D_xF(a,b) \cdot (x-a) + D_yF(a,b) \cdot (y-b) + o(\lVert (x,y) - (a,b)\rVert) \end{equation}

(In my above explanation, I was too lazy to carry around the little-oh, so I just wrote $\approx$ everywhere instead)