How does an inner product induce a definition for angles?

381 Views Asked by At

A metric gives a notion of distance, a norm gives a notion of lengths and distance, and an inner product give a notion of angles, lengths, and distances. But how does it do so?

In $\Bbb R^n$, you can use $cos ^{-1}(\sqrt {\langle u,v \rangle}/ \|u\| \|v\|)$ and in hyperbolic space, you can use $cosh ^{-1}(\sqrt {\langle u,v \rangle}/ \|u\| \|v\|)$, but that requires an a priori knowledge of the cosine function that corresponds to a given space. How can you derive this cosine variant solely through the inner product?

2

There are 2 best solutions below

1
On BEST ANSWER

One can define $\cos$ purely analytically, by the Taylor series $$ \cos x= 1- \frac{x^2}{2!}+\frac{x^4}{4!}-\frac{x^6}{6!}+\dotsb $$ (Alternatively, $\cos$ can be defined as the unique solution to the initial value problem $y''=-y$, with the conditions $y'(0)=0$ and $y(0)=1$. Both of these definitions do not rely on geometric notions such as "angle", although there is some work involved in ensuring that they make sense in the first place.)

Once we have established that $\cos$ is one-to-one on the interval $[0,\pi]$, we can define $\arccos$ as the inverse of the restriction of $\cos$ to this interval. Then, we can define the angle between the vectors $\mathbf u$ and $\mathbf v$ as $$ \arccos\left(\frac{\mathbf{u}\cdot\mathbf{v}}{\vert\mathbf u\vert \vert \mathbf v\vert}\right) \, . $$ Finally, we can give an analytical definition of arc length, as described on Wikipedia. Having done all this, it can be verified that the above definition is equivalent to the following geometric definition: the angle between the vectors $\mathbf u$ and $\mathbf v$ is the length of a circular arc with verticies that lie on $\mathbf u$ and $\mathbf v$, divided by the arc's radius. (To be ultra-precise, we should specify that we choose the smaller of the two possible arcs going from $\mathbf{u}$ to $\mathbf{v}$, as $\arccos$ ought to give you the acute or obtuse angle, not the reflex angle.)

Now, you might regard all of this as "cheating" – and in a way, it is. After all, what is the motivation behind defining $\cos$ analytically, if it is not obvious that this agrees with the traditional geometric definition? And this geometric definition does use the notion of "angle", making this whole process seem circular. The truth is that the path to arriving at the analytical definition is rather circuitous. A more honest account of what a mathematician might do is as follows: first, she might define $\cos$ geometrically, without paying much attention to what an angle "really" is; second, she would develop the theory of trigonometric functions semi-rigorously; third, she would derive the Taylor series of $\cos$ from the geometric definition; fourth, she would acknowledge the lack of rigour in this approach—after all, the measure of an angle is fundamentally an analytical concept; and fifth, she would get around this problem by using the analytical characterisation of $\cos$ as the definition, before showing that this is equivalent to the more familiar geometric characterisation.

It is the same story with $\cosh$: we can use its analytical characterisation to give a rigorous definition of an angle in hyperbolic space. And the motivation behind doing this comes from the fact that earlier, we used an informal geometric definition of $\cosh$ to "derive" the analytical definition.

1
On

$ \newcommand\R{\mathbb R} \newcommand\Cl{\mathrm{Cl}} $

Let $V$ be a real vector space equipped with a symmetric bilinear form $B : V\times V \to \R$; that is, $B$ is linear in each argument and $B(v,w) = B(w,v)$. For brevity, we will define $v^2 := B(v,v)$. There are three potential types of vectors corresponding to the sign of $v^2$; call these negative, null, and positive. A vector is a unit if $B(v,v) = \pm1$. We say two vectors $v,w$ are orthogonal if $B(v,w) = 0$.

Let $v,w$ be orthogonal unit vectors and let $P$ be the plane spanned by them. We take $v$ as our reference, i.e. the vector we want to measure an angle against. This is justified in some cases by an angle of 0 giving back $v$, but we'll see this isn't always possible.

When $v$ and $w$ are both positive, call $P$ a Euclidean plane. Every element of $P$ satifies $$ (av + bw)^2 = a^2 + b^2 \geq 0, $$ hence we may parameterize $$ u = (r\cos\theta)v+ (r\sin\theta)w,\quad r\geq0. $$ Unit vectors in $P$ are such that $r=1$, and in this case $$ B(u, v) = \cos\theta. $$ This shows that for any two positive unit vectors, there is an angle $\theta$ characterizing their separation, with orthogonal being "most separated".

When $v,w$ are negative, call $P$ an *anti-Euclidean plane$; in this case the discussion is (almost) exactly the same as the Euclidean case.

When $v$ is positive and $w$ is negative, call $P$ a hyperbolic plane; this is not the hyperbolic plane from hyperbolic geometry, but instead meerly an overlap in terminology. Then we see $$ (av + bw)^2 = a^2 - b^2, $$ and hence we may parameterize $$ u = (r\cosh\xi)v + (r\sinh\xi)w \tag{$+$} $$ if we want $u$ to be positive, and $$ u = (r\sinh\xi)v + (r\cosh\xi)w \tag{$-$} $$ if we want $u$ to be negative. There are additionally two null parameterizations $$ u = \pm rv \mp rw \tag{$0$} $$ where in all cases $r$ is arbitrary. In the ($+,-$) cases, $u$ is a unit when $r=\pm1$, in which case $$ B(u,v) = \pm\cosh\xi, \tag{$+$} $$$$ B(u,v) = \pm\sinh\xi. \tag{$-$} $$ We see that vectors with the same sign in a hyperbolic plane can never be orthogonal, and when the have differing sign their "angle" measures how close they are, rather than their separation ($\xi=0$ giving orthogonality).

In the ($0$) case, there is no concept of "unit"; however, we can confirm that there are two unique pairs of null vectors $\eta, \nu$ and $-\eta, -\nu$ such that $B(\eta, \nu) = 1$ (called hyperbolic bases for $P$). We may take $$ \eta = \frac{v + w}{\sqrt2},\quad \nu = \frac{v - w}{\sqrt2}, $$ whence we see $$ B(\pm\eta, v) = \pm\frac1{\sqrt2},\quad B(\pm\nu, v) = \pm\frac1{\sqrt2}. $$ This could be interpreted as saying that these null vectors are equally separated from the non-null vectors; they are isolated.

When $v^2 = \pm1$ and $w$ is null, then $$ (av+bw)^2 = a^2v^2 $$ and we may parameterize $$ u = rv + rdw. $$ Unit vectors in this case are such that $r = \pm1$, and we see $$ B(u,v) = \pm1, $$ depending on the sign of $r$ and of $v$. In this case it would be fair to interpret all such unit vectors as being "separated" in the same fashion. In fact, we may interpret $P$ as a projective line where non-null vectors represent points on the line, and null vectors (the multiples of $w$) are the point at infinity. Doing so, $d$ then has the interpretation of distance between points.

Finally, when $v$ is null, then $B(u,v) = 0$ for all $u \in P$ and there is nothing interesting to say.


Edit:

I want to expand on how specifically the functions $\cos, \sin, \cosh, \sinh$ arise, but this is necessarily going to require more background than the first part of my answer (which is why I wrote that first). I also want to note that I feel like a similar answer to the following can be given in terms of Lie algebras and Lie groups, but I am not familiar enough with them to give that answer.

Once we have a symmetric bilinear form like $B$, we also have a Clifford algebra $\Cl(V, B)$. This is essentially the associative algebra generated by $V$ subject to the relation $v^2 = B(v,v)$ for all $v \in V$; note that $$ 2B(v, w) = (v + w)^2 - v^2 - w^2, $$ so this algebra also encodes $B$ in its entirety. Every real Clifford algebra is determined up to isomorphism by three nonnegative integers $(p, q, r)$ such that $p + q + r = \dim(V)$ and every orthonormal basis of $V$ can be written $\{e_1, \dotsc, e_p, f_1, \dotsc, f_q, g_1, \dotsc, g_r\}$ where $$ e_i^2 = 1,\quad f_j^2 = -1,\quad g_k^2 = 0, $$ for each $i, j, k$. We denote such a Clifford algebra by $\Cl_{p,q,r}(V)$. There is also a natural identification of $\Cl_{p,q,r}(V)$ with the exterior algebra of $V$, and so we may speak of elements of $\Cl_{p,q,r}(V)$ as multivectors.

In particular, we can form bivector blades $P = v\wedge w$ for $v, w \in V$. This can be thought of as representing the plane spanned by $v$ and $w$, which I will denote by $[P]$. Bivector blades, like vectors, also square to a scalar, and if $v, w$ are orthogonal then $P^2 = -v^2w^2$; thus we can classify $[P]$ via $P^2$: $$\begin{aligned} P^2 < 0 &\iff [P]\text{ is Euclidean or anti-Euclidean,} \\ P^2 = 0 &\iff [P]\text{ is degenerate,} \\ P^2 > 0 &\iff [P]\text{ is hyperbolic,} \end{aligned}$$ where "degenerate" means there is some $x \in [P]$ such that $B(x, y) = 0$ for all $y \in [P]$. We can form general bivectors as sums of bivector blades; not every bivector is a blade, by the fact that when e.g. $\dim(V) = 2^m$ there are $m$ planes which intersect only at the origin.

The key now is that (some technicalities aside) every orientation-preserving orthogonal transformation is representable as the exponential of a bivector: $$ v \mapsto e^{-X}ve^{X},\quad e^X = \sum_{i=0}^\infty\frac1{i!}X^i, $$ for a $X = P_1 + \cdots + P_m$ a sum of bivector blades. An "orientation-preserving orthogonal transformation" is usually considered to be a kind of "rotation". For a blade $P$, we are specifically interested in the case that $P$ is a unit or null, i.e. $P^2 = \pm1$ or $P^2 = 0$, in which case a "rotation" in the plane $[P]$ is achieved by $$ v \mapsto e^{-\theta P/2}ve^{\theta P/2}, $$ where $\theta$ is a scalar, and in particular when $v \in [P]$ we may write $$ v \mapsto ve^{\theta P}. $$ It is now easy to confirm that $$\begin{aligned}\ [P]\text{ is (anti-)Euclidean} &\implies e^{\theta P} = \cos\theta + P\sin\theta, \\ [P]\text{ is hyperbolic} &\implies e^{\theta P} = \cosh\theta + P\sinh\theta, \\ [P]\text{ is null} &\implies e^{\theta P} = 1 + \theta P, \end{aligned}$$ and when $v$ is not degenerate we have $vP = w$ where $w \in V$ is orthogonal to $v$ and $w^2 = -v^2P^2$. Defining $r := |v| = \sqrt{|v^2|}$ and writing $v = rv', w = rw'$, we see $$\begin{aligned}\ [P]\text{ is (anti-)Euclidean} &\implies ve^{\theta P} = (r\cos\theta)v' + (r\sin\theta)w', \\ [P]\text{ is hyperbolic} &\implies ve^{\theta P} = (r\cosh\theta)v' + (r\sinh\theta)w', \\ [P]\text{ is null} &\implies ve^{\theta P} = rv' + r\theta w'. \end{aligned}$$ Note how in the null case it is the coefficient of the null vector that gives us an "angle", and that the bilinear form $B$ fails to capture this. We see that in this case we are translating $v$ by a distance $|v|\theta$ in the $w'$ direction.