I'm learning a very tiny bit of Morse theory, from this book.
(Lemma of Morse) : let $p$ be a non degenerate critical point for $f$. Then there's a local coordinate system $(y^1, \ldots, y^n)$ in a neighborhood $U$ of $p$ with $y_i(p) = 0$ for all $i$ and such that the identity $$ f = f(p) - (y^1)^2 - \ldots - (y^\lambda)^2 + (y^{\lambda+1})^2 + \ldots (y^n)^2 $$ holds throughout $U$, where $\lambda$ is the index of $f$ at $p$.
The proof is below, written in full until the bit I don't understand.
Proof : We first show that if there's any such expression for $f$, then $\lambda$ must be the index of $f$ at $p$. For any coordinate system $(z^1, \ldots, z^n)$, if $$ f(q) = f(p) - (z^1(q))^2 - \ldots - (z^\lambda(q))^2 + (z^{\lambda+1}(q))^2 + \ldots + (z^n(q))^2 $$ then we have $$ \frac{\partial^2f}{\partial z^i \partial z^j}(p) = \begin{cases} -2 & i =j \leq \lambda \\ 2 & i =j > \lambda \\ 0 & \text{otherwise} \end{cases} $$ which shows that the matrix representing $f_{**}$ w.r.t. the basis $ \frac{\partial}{\partial z^1}(p),\ldots, \frac{\partial}{\partial z^n}(p)$ is $$ \text{diag}\left(\underbrace{-2,\ldots,-2}_{\lambda \;\text{times}},\underbrace{2,\ldots, 2}_{n - \lambda \;\text{times}} \right). $$ Therefore there's a subspace of $TM_p$ of dimension $\lambda$ where $f_{**}$ is negative definite, and a subspace $V$ of dimension $n-\lambda$ where $f_{**}$ is positive definite. If there were a subspace of $TM_p$ of dimension grater than $\lambda$ on which $f_{**}$ were negative then this subspace would intersect $V$, which is clearly impossible. Therefore $\lambda$ is the index of $f_{**}$. We now show that a suitable coordinate system $(y^1, \ldots, y^n)$ exists. Obviously we can assume that $p$ is the origin of $\mathbb{R}^n$ and that $f(p) = f(0) = 0$.
At this point the lemma 2.1. in the book is also used, you can read the relevant statement in case since it's really simple. Continuing with the proof
By 2.1. we can write $$ f(x_1,\ldots, x_n) = \sum_{j=1}^n x_j g_j(x_1,\ldots, x_n) $$ for $(x_1, \ldots, x_n)$ in some neighborhood of $0$. Since $0$ is assumed to be a critical point $g_j(0) = \frac{\partial f}{\partial x^j}(0) = 0$. Therefore applying 2.1. to $g_j$ we have $$ g_j(x_1,\ldots,x_n) = \sum_{i=1}^n x_i h_{ij}(x_1,\ldots,x_n) $$ for certain smooth functions $h_{ij}$. It follows that $$ f(x_1,\ldots,x_n) = \sum_{i,j=1}^n x_i x_j h_{ij}(x_1,\ldots,x_n). $$ We can assume that $h_{ij} = h_{ji}$, since we can write $\bar{h_{ij}} = \frac{1}{2}(h_{ij} + h_{ji})$ and then we have $\bar{h}_{ij} = \bar{h}_{ji}$. Moreover $(\bar{h}_{ij}(0))$ is equal to $\left( \frac{1}{2} \frac{\partial^2 f}{\partial x^i \partial x^j}(0) \right)$, hence is non-singular.
Now we have the bit where I need a clarification
There is a non-singular transformation of coordinate functions which gives us the desired expression for $f$, in perhaps smaller neighborhood of $0$. To see this we just imitate the usual diagonalization proof for quadratic forms. The key step is described as follows. Suppose by induction there exist coordinates $u_1,\ldots,u_n$ in a neighborhood $U_1$ of $0$ so that $$ f = \pm (u_1)^2 \pm \ldots \pm (u_{r-1})^2 + \sum_{i,j \geq r} u_i u_j H_{ij}(u_1,\ldots, u_n) $$ throughout $U_1$; where the matrices $H_{ij}(u_1,\ldots,u_n)$ are symmetric. After a linear change in the last $n-r+1$ coordinates we may assume that $H_{rr}(0) \neq 0$.
Question 1 why can we make such assumption? Continuing with the proof
Let $g(u_1,\ldots,u_n)$ denote the square root of $\left| H_{rr}(u_1,\ldots,u_n)\right|$. This will be a smooth, non-zero function of $u_1,\ldots,u_n$ throughout some smaller neighborhood $U_2 \subset U_1$ of $0$. Now introduce new coordinates $v_1,\ldots,v_n$ by $$ \begin{cases} v_i = u_i & i \neq r \\ v_r(u_1,\ldots,u_n) = g(u_1,\ldots,u_n) \left[u_r + \sum_{i>r} u_i H_{ir}/H_{rr}(u_1,\ldots,u_n) \right] & i = r \end{cases} $$ It follows from the inverse function theorem that $v_1,\ldots,v_n$ will serve as coordinate functions within some sufficiently small neighborhood $U_3$ of $0$.
Question 2: How exactly is the inverse function theorem applied here? My reference for such theorem is "Theorem 6.26" of Tu's An Introduction to Manifolds, which I'll write down with the same notation used by Tu
Theorem 6.26 (Inverse function theorem for manifolds). Let $F : N \to M$ be a $C^{\infty}$ map between two manifolds of the same dimension, and $p \in N$. Suppose for some charts $(U,\phi) = (U,x^1,\ldots,x^n)$ about $p \in N$ and $(V,\psi) = (V,y^1,\ldots,y^n)$ about $F(p)$ in $M$, $F(U) \subset V$. Set $F^i = y^i \circ F$. Then $F$ is locally invertible at $p$ if and only if its jacobian determinant $\det \left[\frac{\partial F^i}{\partial x^j (p)} \right]$ is nonzero.
I can't really figure though how to apply this theorem.
The final bit of the Morse Lemma is to show an explicit expression for the coordinate change, which I believe I can workout later as exercise.
Can you clarify the bits of the proof I don't get?