What definition of "nearly orthogonal" would result in "In a 10,000-dimensional space there are millions of nearly orthogonal vectors"?

672 Views Asked by At

Quanta Magazine's April 13, 2023 A New Approach to Computation Reimagines Artificial Intelligence starts with:

By imbuing enormous vectors with semantic meaning, we can get machines to reason more abstractly — and efficiently — than before.

Later on, during the explanation are the paragraphs:

The vectors must be distinct. This distinctness can be quantified by a property called orthogonality, which means to be at right angles. In 3D space, there are three vectors that are orthogonal to each other: One in the x direction, another in the y and a third in the z. In 10,000-dimensional space, there are 10,000 such mutually orthogonal vectors.

But if we allow vectors to be nearly orthogonal, the number of such distinct vectors in a high-dimensional space explodes. In a 10,000-dimensional space there are millions of nearly orthogonal vectors.

I remember reading previous questions here with high dimensions and dot products are discussed and seeing comments about how easy it is to get very small or even zero dot products in high dimensions, but I've never worked outside of one, two and three dimensional problems.

Question: What definition of "nearly orthogonal" would result in "In a 10,000-dimensional space there are millions of nearly orthogonal vectors"? Would it be for example dot product1 smaller than some number like 0.1?


1of the presumably normalized vectors

2

There are 2 best solutions below

9
On

Based on discussion and references in the comments (principally from user L.F.) it appears that, as I guessed, two vectors are considered “nearly orthogonal” if their dot product is “small”: that is, $a$ and $b$ are nearly orthogonal if, in the context of a particular value $\epsilon$, we have $$-\epsilon \le \frac{a\cdot b}{\lvert a\rvert \lvert b \rvert} \le \epsilon.\tag{$\star$}$$ The smaller the value of $\epsilon$, the stricter the requirement imposed by “nearly orthogonal”. As $\epsilon$ goes to zero, the meaning of “nearly orthogonal” approaches the actually orthogonal.

This Math Overflow post asks how, for given $n$ and $\epsilon$, one can find a large family of vectors from $\Bbb R^n$ that are nearly orthogonal in this sense. The top answer there, by Bill Johnson, cites the so-called Johnson-Lindenstrauss lemma and claims that you can find a family of $k$ nearly-orthogonal vectors if

$$n\ge C \epsilon^{-2} \log k$$

where $C$ is a fixed constant no larger than $8$. Bill Johnson is (or at least purports to be) one of the namesakes of the Johnson-Lindenstrauss lemma, so the answer is likely to be reliable.

Turning this around, we have that, given $n$ and $\epsilon$, one can find at least $$e^{n\epsilon^2/8}$$ nearly-orthogonal vectors. Note that the appearance of $e$ here is rather arbitrary, as its value can be absorbed into the $C$. For the specific case of $k \approx 10^6, n=10000$ that you asked about, we find that $\epsilon = 0.12$ is sufficient to find many millions of nearly-orthogonal vectors, but $\epsilon = 0.1$ may not be.

(Beware; some of the answers seem to consider the less strict constraint that the dot product lie in $[-1, \epsilon]$ rather than in $[-\epsilon, \epsilon]$, and I have not thought carefully about how this will affect the results. For large $n$, not too much, I think.)

A reply by Timothy Gowers explains why this result is plausible: The vectors in $\Bbb R^n$ lie on the the unit $n-1$-sphere, and each vector can be thought of as excluding a portion of this sphere that is proportional to $(1-\epsilon)^{n-1}$.

Separate answers by Ryan O'Donnell and by ‘kodlu’ provide a method for locating $2^{O(\epsilon^2n)}$ nearly-orthogonal unit vectors: simply select random vectors whose components are $\pm \frac1{\sqrt n}$; by a probabilistic argument these will usually be nearly-orthogonal.

Disclaimer: I tried to summarize the MO discussion, but I did not think about any of it carefully, so I may have gotten the details wrong. Jelani Nelson suggests consulting Problems and Results in Extremal Combinatorics Part I, by Noga Alon, for details. This is currently available online from Professor Alon's web site at Tel Aviv University. The pertinent part seems to be section 9.

2
On

Coding theoretical methods (or number theoretical, if you prefer), most notably constructions relying on exponential sums over finite fields, are a rich source of such families of vectors. I will describe one very flexible construction, and list its relevant parameters afterwards. The gist is that in $n$ dimensional space we end up with $n^k$ unit vectors with all the pairwise inner products bounded from above by $M/\sqrt n$ with the constant $M$ increasing linearly with $k$.

I will consider binary vectors (all entries $\pm1$) in what follows. We can use larger sets of complex roots of unity instead, if so desired.


Let $q=2^m$ and $F=\Bbb{F}_q$. Let $e:F\to\{\pm1\}$ be a non-trivial additive character (= a homomorphism from the additive group to the multiplicative group $\{\pm1\}$). A somewhat canonical such beast is $$e(x)=(-1)^{tr(x)},$$ where $tr:F\to\Bbb{F}_2$ is the trace map, $tr(x)=x+x^2+x^4+x^8+\cdots+x^{2^{m-1}}$.

Given any polynomial $f(x)\in F[x]$ we get a $q$-dimensional vector $w_f$ with components indexed by $x\in F$ with the recipe $$ w_f(x)=e(f(x)) $$ for all $x\in F$. So we see that $||w_f(x)||^2=q$ because all the entries are $\pm1$.

A most general tool for finding good families is constrain the choice of the polynomial $f(x)$ by giving an upper bound to its degree. Another constraint we need is to disallow even degree terms. This comes from the fact that for all $x\in F$ we have $tr(x)=tr(x^2)$. So let's fix a positive integer $N$, and consider the collection of polynomials $$ C(N)=\{a_0x+a_1x^3+a_2x^5+\cdots a_{N-1}x^{2N-1}\in F[x]\mid \text{$a_i$ arbitrary elements of $F$ for all $i$}\}. $$ We see that there are $q^N$ polynomials in this set.

If $f(x)=a_0x+\cdots+a_{N-1}x^{2N-1}$ and $g(x)=b_0x+\cdots+b_{N-1}x^{2N-1}$ are distinct polynomials from $C(N)$, then the inner product $$\langle w_f,w_g\rangle=\sum_{x\in F}e(f(x)+g(x)).\qquad(*)$$ Here the sum polynomial $f(x)+g(x)$ is a non-zero polynomial of degree at most $2N-1$ and without even degree terms. The Weil-Carlitz-Uchiyama bound then says that the character sum in $(*)$ has absolute value at most $(2N-2)\sqrt q$.

Recall that all the vectors $w_f$ have Euclidean norm $=q$, so this bound on character sums implies that $$ |\cos(\theta(f,g))|\le \frac{2N-2}{\sqrt q}.\qquad(**) $$ The inequality $(**)$ then means that, with $q$ large enough in comparison $2N-2$, the angles $\theta(f,g)$ between the vectors $w_f$ and $w_g$, $f\neq g$, are very close to $\pi/2$. This is why I think the construction fits.


Some example numbers.

  • With $m=14$ we have $q=16384$, so the vectors have dimension $16384$. Choosing $N=2$ we get a family of $q^N=2^{28}\approx260\,\text{million}$ vectors such that the cosines between any two of them have absolute value $\le 2/\sqrt{q}=1/64$.
  • With $m=20$ we have $q=1048576$, $\sqrt q=1024$. With $N=3$ we get a family of roughly $10^{18}$ vectors in $\{\pm1\}^{1048576}$ with cosines of the angles in between bounded from above by $4/1000$.

With small values of $N$ there are number theoretical quirks (the theory of quadratic and bilinear forms over $F$ come to the fore) allowing a saving of a factor of $\sqrt2$ in the W-C-U -bound and hence also in the upper bound to the cosines. However, those apply only when $m$ satisfies some congruence.