$\def\RR{\mathbb{R}}$
In page 2 of Gehring & Halmos' General Theory of Relativity for Mathematicians the following definitions are made:
Definition: an inner product $g$ is a function $V^2\to \RR$ such that
- $g$ is bilinear.
- $g$ is symmetric.
- $g$ is nondegenrate, meaning that for any non-zero $x$ there is a $y$ such that $g(x,y)\ne 0$.
*The third condition is weaker than the usual positive-definiteness needed to make $g$ a proper inner product.
Definition: letting $$S=\{W|W \text{ is a subspace of } V \text { and } g|_W \text{ is negative definite}\}$$ we define the index $I$ of $g$ as the integer $$I:=\max_{W\in S} \dim W.$$
Definition: a basis $B=\{e_1,\ldots,e_N\}$ of $V$, with dual basis $\{e^1,\ldots,e^N\}$, is called orthonormal (with respect to the inner product $g$) iff $$g = \sum_{a=1}^{N-I}e^a\otimes e^a - \sum_{a=N-I+1}^N e^a\otimes e^a$$ where the appropriate sum is zero if $I=0$ or $I=N$. Equivalently, we say $B$ is orthonormal iff \begin{split} g(e_a,e_b) & = 0 \text{ if } a \ne b\\ g(e_a,e_a) & = \begin{cases} 1 & \text{ if } 1\le a \le N-I\\ -1 & \text{ if } N-I+1 \le a \le N\\ \end{cases}\\ \end{split}
What geometrical intuition may be given to the concepts just defined (inner product, index, and orthonormal basis)?
I understand that, since $g(x,y)$ may be negative, we may be interested in the 'largest' subspace of $V$ which makes $g$ negative definite, and that, when defining the analogue of an orthonormal basis, the best we can hope for is $g(e_a,e_a)=\pm 1$, yet I remain with little intuition as to what these notions can mean geometrically.
Consider the following setting. Let $\mathcal{E}=(e_1,\dots,e_N)$ be a basis of $V$. Take $v,w$ vectors of $V$ with coefficients $$\overline{v}=\begin{pmatrix}v_1\\ \vdots\\v_N\end{pmatrix}\text{ and }\overline{w}=\begin{pmatrix}w_1\\ \vdots\\w_N\end{pmatrix}$$ in the basis $\mathcal{E}$. Then define the matrix $M$ by its coefficients $m_{ij}:=g(e_i,e_j)$. Then you will have the following relation: $$g(v,w)=\,^t\overline{v}\cdot M\cdot \overline{w}.$$ Since $M$ is symmetric, you know that if you choose correctly your basis $\mathcal{E}$, $M$ will be a diagonal matrix. Since $g$ is nondegenerate, you can check that its coefficients are nonzero, and up to scaling the vectors of $\mathcal{E}$ you know that you are able to make them $+1$ (let's say for $m_{ii}, 1\leq I\leq N-I$) or $-1$ (for $m_{jj}, N-I+1\leq j\leq N$). Then, what will be the form of $g$ once you work with this basis? Take $x$ a vector with coordinates $$\overline{x}=\begin{pmatrix}x_1\\ \vdots\\x_N\end{pmatrix}$$ in $\mathcal{E}$. Then: $$g(x,x)=\,^t\overline{x}\cdot M\cdot \overline{x}=\begin{pmatrix}x_1& \dots&x_N\end{pmatrix}\begin{pmatrix} \mathrm{Id}_{N-I}&0\\ 0&-\mathrm{Id}_{I} \end{pmatrix}\begin{pmatrix}x_1\\ \vdots\\x_N\end{pmatrix}=x_1^2+\dots+x_{N-I}^2-x_{N-I+1}^2-\dots-x_N^2.$$ Thus, geometrically speaking, the index is the number of directions in which the quadratic form $$x\mapsto g(x,x)$$ looks like minus the squared function once computed trough an orthonormal basis (moreover, in the others directions of this basis, it looks like plus the squared function). Now, let's suppose $V$ is a normed space, take a function $f:V\to \mathbb{R}$ of class $C^2$, and consider $a\in V$. Then you will have the Taylor formula $$f(a+x)=f(a)+Df_a(x)+D^2f_a(x,x)+o(||x||)$$ where $Df_a$ is the derivative of $f$ at $a$ and $D^2f_a$ is the second derivative of $f$ at $a$, namely the bilinear and symmetric form given by $D^2f_a(u,v)=\partial_v|_a(\partial_u f)$ where $$\partial_w|_bf:=\lim\limits_{t\to 0}\frac{f(b+tw)-f(tw)}{t}$$ and $\partial_u f$ is the function $a\mapsto\partial_u|_af$. If $a$ is a critical point (i.e. $Df_a=0$) and nondegenerate (i.e. $D^2f_a$ is nondegenerate as in your definition), then we know the local behavior of $f$ around $a$, since $D^2f_a$ is in some sense the best second-order approximation we can get: it will be up to some change of coordinates and negligible terms a sum of minus and plus squared functions. This local description is one of the interests of studying inner products: such functions with only nondegenerate critical points are called Morse functions, and are studied for the good properties this local description allows to get (you can even get rid of the negligible terms I was talking about by the mean of the Morse lemma, for example).