Number of correlations to make before a large correlation is dubious?

65 Views Asked by At

Suppose you had $m$ vectors each of dimension $n$x$1$, and you compute a Pearson correlation coefficient for every unique pair of them. For example, for one such pair of vectors $x$ and $y$, the computation is:

${\displaystyle r_{xy}={\frac {\sum _{i=1}^{n}(x_{i}-{\bar {x}})(y_{i}-{\bar {y}})}{{\sqrt {\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}}{\sqrt {\sum _{i=1}^{n}(y_{i}-{\bar {y}})^{2}}}}}}$

where ${\bar {x}}$ and ${\bar {y}}$ are the means of $x$ and $y$ respectively, and $x_i$ and $y_i$ are the $i$th features of the vectors $x$ and $y$ respectively. Suppose that every pair of vectors is independent (i.e., $x$ is independent of $y$), but that they are not necessarily identically distributed.

Obviously, if you have small number of $n$ features and very large $m$, you will stumble on a pair of vectors that have a large correlation $r$ purely by chance. For a given value of $m$ and $n$ (e.g., m=1000 and n=10), and with no further assumptions about the $m$ vectors and their $n$ features (except that they have finite variance that is greater than zero), is it possible to compute a value for $P( |r_{xy}| > 0.95)?$

The real question I am interested in is this: given a certain number of features $n$, how many $m$ independent vectors should you correlate before you expect to see a large correlation due to chance (e.g., $r$ > 0.95)?

1

There are 1 best solutions below

3
On BEST ANSWER

Suppose you have $N$ nonconstant vectors in $\mathbb R^n$. For each one, subtract the mean and normalize, and you have a unit vector in the $n-1$-dimensional subspace $V$ of $\mathbb R^n$ orthogonal to $(1,\ldots, 1)$. If unit vectors $u$ and $v$ have $u \cdot v < 0.95$, it means $\|u - v\|^2 = 2 - 2 u \cdot v > 0.1$. So the balls of radius $\sqrt{0.1}/2$ about $u$ and $v$ are disjoint. If that's true for all pairs, you have $N$ disjoint balls of radius $\sqrt{0.1}/2$ centred at points of the unit sphere of $V$. If $m$ is the $n-2$-dimensional measure of the intersection of such a ball with the unit sphere, and $M$ the $n-2$-dimensional measure of the sphere, then this is impossible if $N > M/m$.

EDIT: Maybe it will help if we look at the case $n=3$. The subspace $V$ has dimension $2$, so we can draw a picture. The angle between two unit vectors $u$ and $v$ with $u \cdot v = 0.95$ is $\arccos(0.95) \approx 0.31756$. If you want $N$ unit vectors whose dot products with each other are all $< 0.95$, theangle between them must be greater than $\arccos(0.95)$. Now $2\pi/\arccos(0.95)\approx 19.786$. You can take $19$ unit vectors in the plane whose dot products are all $< 0.95$, but there isn't room for $20$. Here's a picture:

enter image description here

The angle between the first vector and the $19$'th is slightly less than $2 \arccos(0.95)$, so you can't put another vector between them.