Understanding the proof of Morse Lemma using homotopy method

370 Views Asked by At

On pages no. 52-56 of "Lectures on Morse Homology" by Augustin Banyaga and David Hurtubise presented a proof of Morse Lemma using homotopy method AKA Palais proof using Moser "path method". There are several statements I did not understand while trying to read the complete proof.

The proof starts like this: "By replacing f by f-f(p) and by choosing a suitable coordinate chart on M we may assume that the function f is defined on a convex neighborhood $U_0$ of $0\in R^m$ where $f(0)=0, df(0)=0$ and the matrix of the Hessian at $0\in R^m$, $M_0(f) = A = (\frac{\partial^2 f}{\partial x_i \partial x_j}(0))$, is a diagonal matrix with the first k diagonal entries equal to -1 and the rest equal to +1.

The matrix A induces a function $\widetilde{A}(x) = x^tAx=<Ax,x>=\sum_{j=1}^{m}\delta_j x_j^2$ where $\delta_j = \frac{\partial^2 f}{\partial^2 x_j}(0) = \pm1$ for all j=1,...,m. We want to prove that there are neighborhoods U and U' of $0$ with $U \subseteq U_0$ and a diffeomorphism $\varphi: U->U'$ such that:

$f\circ\varphi = \widetilde{A}$. (3.1)

The Idea of the path method is to interpolate f and $\widetilde {A}$ by a path such as, $f_t = \widetilde{A} + t(f-\widetilde{A})$ (3.2)

and to look for a smooth family $\varphi_t$ of diffeomorphisms such that $f_t\circ\varphi_t=f_0=\widetilde{A}$. (3.3)

Then $\varphi=\varphi_1$ will satisfy $f\circ\varphi=\widetilde{A}$.

We get $\varphi_t$ as a solution of the differential equations: $\frac{d\varphi_t}{dt}(x)=\xi_t(\varphi_t(x)); \varphi_0(x)=x$ where the smooth family $\xi_t$ is the tangent along the curves $t\mapsto\varphi_t(x)$. Taking the partial derivative with repect to t of both sides of (3.3) gives

$(\dot{f_t}\circ\varphi_t + (\xi_t\cdot f_t)\circ\varphi_t)(x)=0$ (3.4) for all $x \in U$ where $\dot {f_t}$ denotes $\frac{\partial f_t}{\partial t}$.

Thus, $(\dot{f_t} + \xi_t \cdot f_t)(y) = 0$ (3.5) for all $y \in U'$.

But $\dot{f_t} = f - \widetilde{A}$ by (3.2), and therefore (3.5) becomes $df_t(\xi_t)=g$ (3.6) where $g=\widetilde{A} - f$."

My questions regarding this part of the proof are these:

First, "Taking the partial derivative with respect to t of both sides of (3.3) gives (3.4)" - why is that so? I tried defining a function $h:R^{n+1}\rightarrow R^n$ such that $h(t,x)=f_t\circ\varphi_t$ and differentiating it using the chain rule, and what i got is:

$\frac{\partial h}{\partial t}(t,\varphi_t)+(\xi_t \circ \varphi_t) \cdot \frac{\partial h}{\partial x}(t,\varphi_t)$

and not:

$\frac{\partial h}{\partial t}(t,\varphi_t)+(\xi_t \circ \varphi_t) \cdot h(t,\varphi_t)$

as it seems I should get by (3.4).

Second, "and therefore (3.5) becomes $df_t(\xi_t)=g$" - to my understanding (3.5) should become: $\xi_t \cdot f_t=g$, meaning that $df_t(\xi_t)=\xi_t \cdot f_t$ - why is this the case here?

Moreover, the proof ends like this (there is a middle part to the proof which I omitted because I understood it...): "$B_0^t$ (which is defined by $\frac{\partial^2 f_t}{\partial x_i \partial x_j})(0)$) is non-degenerate for all $0\leq t\leq 1$, and there exists a neighborhood $\widetilde{U}$ of $0 \in R^m$ such that $B_x^t$ is also non-degenerate for all t. For $x \in U$, we have a unique solution $\xi_t$ of $<B_x^t \xi_t,x>=<G_x x,x>$ and this solution depends smoothly on both x and t. That is, we have a smooth solution to (3.6) defined on $\widetilde{U}$. Clearly, $\xi_t(0)=0$ since $B_0^t$ is non-degenerate. Hence, by shrinking $\widetilde{U}$ we can integrate $\xi_t$ and get a smooth family of diffeomorphisms $\varphi_t$ from a smaller neighborhood $U$ of $0$ to another neighborhood $U'$ of $0$ which satisfies $f_t \circ \varphi_t = f_0 = \widetilde{A}$."

I also have two questions regarding this part of the proof:

  1. It is said that "$B_0^t$ is non-degenerate for all $0\leq t\leq 1$, and there exists a neighborhood $\widetilde{U}$ of $0 \in R^m$ such that $B_x^t$ is also non-degenerate for all t". I don't understand why this statement is correct. What we know is that for every t between 0 and 1, there exists a neighborhood $\widetilde{U}_t$ of $0 \in R^m$ such that $B_x^t$ is also non-degenerate for all x in $\widetilde{U}_t$, but this statement says something stronger, which I do not really understand why this something is correct.

  2. At the very last sentence of the proof it is said that "by shrinking $\widetilde{U}$ we can integrate $\xi_t$ and get a smooth family of diffeomorphisms $\varphi_t$ from a smaller neighborhood $U$ of $0$ to another neighborhood $U'$ of 0 which satisfies $f_t \circ \phi_t=f_0=\widetilde{A}$." - what does it mean "by shrinking"? Does it have anything to do with the neighborhood I asked about in my third question?

1

There are 1 best solutions below

0
On

First of all, let me point out that this book uses the notational convention that if $X$ is a vector field on $M$ and $f \in C^\infty(M)$ then $X \cdot f = \mathrm{d} f(X)$.

This should explain your first questions, as $(\xi_t \circ \phi_t) \cdot h(t,\phi_t) = \mathrm{d} h(t,\phi_t)(\xi_t \circ \phi_t)$, where I use $\mathrm{d} h$ to denote the derivative of $h$ with respect to the `spatial variable' (for which you wrote $\frac{\partial h}{\partial x}$). Similarly, $\mathrm{d} f_t (\xi_t) = \xi_t \cdot f_t$ is purely a matter of notational convention.

  1. It is said that "$B_0^t$ is non-degenerate for all $0\leq t\leq 1$, and there exists a neighborhood $\widetilde{U}$ of $0 \in R^m$ such that $B_x^t$ is also non-degenerate for all t". I don't understand why this statement is correct. What we know is that for every t between 0 and 1, there exists a neighborhood $\widetilde{U}_t$ of $0 \in R^m$ such that $B_x^t$ is also non-degenerate for all x in $\widetilde{U}_t$, but this statement says something stronger, which I do not really understand why this something is correct.

The stronger statement (that $(d^2 f_t)_x$ is non-degenerate in a neighbourhood of $[0,1] \times \{ 0 \}$) holds true because $(d^2 f_t)_0$ is non-degenerate for all times $t \in [0,1]$ up to the endpoints. Loosely speaking, the radius around $0$ in which $(d^2 f_t)_x$ (for a fixed t) cannot tend to $0$ as $t$ tends to $0$ or $1$. You can make this more rigorous with the following lemma.

Let $B: [0,1] \times U \to M_n(\mathbb{R})$ be a continuous map. Suppose that for all $t \in [0,1]$, $B(t,0)$ is invertible. Then there is an open neighbourhood of $[0,1] \times \{0 \}$ in $[0,1] \times U$, $V$ say, such that $B(t,x)$ is invertible for all $(t,x) \in V.$

You can prove this as follows: for all $t \in [0,1]$, there is an open subset $V_t \subset [0,1] \times U$ such that $B$ is invertible in $V_t$. Possibly after taking a smaller $V_t$ you may assume it to be of the `cylindrical' form $( a(t),b(t) ) \times B(0,r(t))$ when $t \in (0,1)$, and of the form $(a(1),1] \times B(0,r(1))$ when $t = 1$ (and similarly when $t = 0$.

Then the intervals $\{ (a(t), b(t)) \mid t \in (0,1) \} \cup \{ [0,b(0)) \} \cup \{ (a(1),1] \}$ form an open cover of $[0,1]$, and we may extract a finite subcover by say $N$ intervals, corresponding to times $0 = t_1 < t_2 < \cdots t_N = 1$. If we let $r = \min_i r(t_i)$, then $B$ is invertible on $[0,1] \times B(0,r)$.)

Here of course you apply this lemma to $B(t,x) = (d^2 f_t)_x$, and this map is continuous because $f$ is smooth (thus $C^2$).

  1. At the very last sentence of the proof it is said that "by shrinking $\widetilde{U}$ we can integrate $\xi_t$ and get a smooth family of diffeomorphisms $\varphi_t$ from a smaller neighborhood $U$ of $0$ to another neighborhood $U'$ of 0 which satisfies $f_t \circ \phi_t=f_0=\widetilde{A}$." - what does it mean "by shrinking"? Does it have anything to do with the neighborhood I asked about in my third question?

You want to show that the flow $(\phi_t)$ of the time-dependent vector field $(\xi_t)$ exists for all times up to $t = 1$. Now, the Cauchy-Lipschitz theorem for ODEs gives you the existence of a solution, but only locally in time: for all $(t_0 , x_0) \in [0,1] \times U$ there is a unique solution $x: J \to U$ defined on some (potentially small) interval $J \subset I$ with \begin{equation} \begin{cases} \frac{\mathrm{d} x(t)}{\mathrm{d} t} &= \xi_t(x(t)) \\ x(t_0)& = x_0. \end{cases} \end{equation}

Here we consider an initial time $t = 0$, so the interval $J$ is half-open. Now, the problem is that a priori the flow of $\xi_t$ can escape $U$ before $t = 1$; this is possible even when the vector field $\xi_t$ is Lipschitz on the open set $U$!

Here the issue can be remedied using the fact that $\xi_t(0) = 0$ for all $t$. This means that the constant solution $x(t) \equiv 0$ is a solution of your ODE. Moreover the vector field $\xi_t$ is globally Lipschitz in the second variable: there exists $K \geq 0$ such that for all $t \in [0,1]$, $\lvert \xi_t(x) - \xi_t(y) \rvert \leq K \lvert x - y \rvert$. You can therefore use Gronwall's lemma to control the norm of a solution $x(t)$. Indeed, as \begin{equation} \lvert x(t) - 0 \rvert \leq \lvert x(0) - 0 \rvert + \int_0^t \lvert \xi_s(x(s)) - 0 \rvert \mathrm{d} s \leq \lvert x(0) - 0 \rvert + K \int_0^t \lvert x(s) \rvert \mathrm{d} s, \end{equation} the Gronwall lemma implies that $\lvert x(t) \rvert \leq e^{Kt} \lvert x(0) \rvert$ for all $t \in J$, its interval of existence. To ensure that $x(t)$ exists for all times up to $t = 1$, it is enough to pick $\lvert x(0) \rvert \leq e^{-K}$. Using the notation from your question, the smaller neighbourhood $U$ is then the ball $B_{e^{-K}}$, and $U'$ is its image under $\phi_1$. (In what precedes $U$ was used for what you wrote as $\widetilde{U}$; hopefully there's no confusion!)