How many $\mathbb R^n$ vectors are needed to guarantee a large correlation between at least a pair of them - an open problem in mathematics?

187 Views Asked by At

Suppose you had $m$ vectors each of dimension $n$x$1$, and you compute a Pearson correlation coefficient for every unique pair of them. For example, for one such pair of vectors $x$ and $y$, the computation is:

${\displaystyle r_{xy}={\frac {\sum _{i=1}^{n}(x_{i}-{\bar {x}})(y_{i}-{\bar {y}})}{{\sqrt {\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}}{\sqrt {\sum _{i=1}^{n}(y_{i}-{\bar {y}})^{2}}}}}}$

where ${\bar {x}}$ and ${\bar {y}}$ are the means of $x$ and $y$ respectively, and $x_i$ and $y_i$ are the $i$th features of the vectors $x$ and $y$ respectively. Suppose that every pair of vectors is independent (i.e., $x$ is independent of $y$), but that they are not necessarily identically distributed.

Given a certain number of features $n$, how many $m$ independent vectors should you correlate before you must see a large correlation between at least one pair of them (e.g., $r$ > 0.95)?

A few days ago, I posed a version of this problem and @Robert Israel gave me an interesting perspective on the matter:

For any pair $x,y$, you can compute the following two unit vectors: $u = {(x_{i}-{\bar {x})/}{{\sqrt {\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}}}}$ and $v = {(y_{i}-{\bar {y})/}{{\sqrt {\sum _{i=1}^{n}(y_{i}-{\bar {y}})^{2}}}}}$

These are unit vectors whose dot product is $r_{xy}$ by definition. Moreover, the dot product of $u$ and the vector $(1,1,...,1)$ in $\mathbb R^n$ is just $\sum _{i=1}^{n}{(x_{i}-{\bar {x})/}{{\sqrt {\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}}}} = 0$, and so $u$ is orthogonal to that vector. So is the dot product of $v$ and $(1,...,1)$. Therefore $u$ and $v$ are both orthogonal to the vector $(1,...,1)$. This means that $u$ and $v$ must lie in the same $n-1$ dimensional plane.

Finding a pair of vectors with a large correlation $r_{xy}$, is therefore same thing as finding vectors where the angle between their corresponding unit vectors $u$ and $v$ has a large cosine.

(Because $r_{xy} = u \cdot v = ||u||*||v||*\cos\theta = \cos\theta$)

Here is an easy version of this problem: How many $\mathbb R^3$ vectors do I need before I guarantee a correlation coefficient of $0.95$ between at least a pair of them? For each $\mathbb R^3$ vector, the unit vectors $u,v$ are all in the plane perpendicular to $(1,1,1)$. And the question is: what is the maximum number of vectors I can fit in that plane that are at least an angle $\theta = \cos^{-1}(0.95)$ apart? The answer is just $2\pi/cos^{-1}(0.95) = 19.786$ vectors. So if I have $20$ $\mathbb R^3$ vectors, I am guaranteed at least a pair that have a correlation $r \geq.95$

Now a harder version of this problem: how many $\mathbb R^n$ vectors do you need before you're guaranteed a correlation of at least $0.95$ between at least a pair of them? If I'm not mistaken, this is the same as asking how many $\mathbb R^{n-1}$ vectors can you fit in an $n-1$ dimensional plane that are at least an angle $\theta = cos^{-1}(0.95)$ apart. I was told that this is equivalent to the spherical code problem, which is an open problem in mathematics. Is that true?