Multivariate Newton's method - how to apply the chain rule to $g(x)=x-(D_xf)^{-1}f(x)$

405 Views Asked by At

Let $f:\mathbb R ^n\to \mathbb R ^n$. Following page 6 here, I'm trying to show the multivariable iteration function $g(x)=x-(D_xf)^{-1}f(x)$ (the analogue of the 1d version for Newton-Raphson) becomes a contraction on a small enough closed ball about a point $p$ with $f(p)=0$.

Most of the argument seems straightforward to generalize, but I'm stumped by the third paragraph - how to generalize it to higher dimension? The final paragraph of page 6 just mentions analogous arguments, but I've realized I don't even know how to painlessly apply the chain rule to $g(x)=x-(D_xf)^{-1}f(x)$.

Finally, I was told that $f$ being $C^1$ is enough, while the proofs I found all used at least existence of second derivatives to differentiate $g$.

How to carry the argument over as sketched in the final paragraph of page 6 here? Is there a proof relying only on $f$ being $C^1$?

1

There are 1 best solutions below

3
On BEST ANSWER

First of all, the derivative of $x$ is the identity matrix $\mathrm{Id}_n$. Your real question is how to apply the chain rule to the second term: $(D_xf)^{-1} f(x)$. Let me point out now that the version I've seen before (when proving the inverse function theorem) does not use $(D_xf)^{-1}$, however. It uses $(D_pf)^{-1}$ for some nearby point $p$ with nonsingular derivative. The point is that $p$ is fixed in $g$, so that the inverse matrix does not depend on the input $x$.

In general, the chain rule says that for maps $\alpha \colon \Bbb{R}^m \to \Bbb{R}^n$ and $\beta \colon \Bbb{R}^n \to \Bbb{R}^k$, and for $x \in \Bbb{R}^m$, that $$D_x(\beta \circ \alpha) = D_{\alpha(x)} \beta \, \circ D_x \alpha $$ In other words, it is the Jacobian matrix of $\alpha$ at $x$ times the Jacobian matrix of $\beta$ at $\alpha(x)$.

Now back to our situation. We have $\alpha = f$ (with $m=1$) and $\beta = (D_pf)^{-1}$ (with $k=n$). In this situation, $D_x \alpha = D_x f$, the Jacobian matrix of $f$ at $x$. Also, in this situation, $D_{\alpha(x)} \beta = D_{f(x)}(D_pf)^{-1}$. The important thing to note at this point is that for a linear map (given by a matrix), its Jacobian matrix is just itself, and does not depend on the point. That is, $D_{f(x)}(D_pf)^{-1} = (D_pf)^{-1}$. So in conclusion, $$ D_x g = \mathrm{Id}_n - (D_pf)^{-1} \, D_xf $$

In particular, $D_pg = 0$.