Distribution of correlation of fixed vector on vectors of n-sphere

226 Views Asked by At

This is a homework but my solution does not match the simulation.

Assuming vector $x$ uniform-distributed on a n-sphere of radius $r$ and taking a fixed vector $c$ with norm 1.

Because the correlation result $y = c^T x = (Vc)^T(Vx)$ for all unitary matrix $V$, and thanks to the uniform distribution of $x$, the distribution of $y$ is the same of the distribution of $y_0=c_0^Tx$ where $c_0=[1,0,...,0]^T$.

According to sphere coordinate, $y_0 = r\times\cos(\theta)$ where $\theta$ is uniform in $[0,\pi]$.

Unfortunately, the histograms generated of $y$ and $y_0$ are very different (see figures with a random $c$). But it seems ok with $c=c_0$. Thus I suspect that I was wrong at the argument that $y$ has the same distribution as $y_0$. But I don't see how I could be wrong!

Note that I generated the uniformly distributed vector following this definition of spherical coordinates by generating Cartesian $x_i$ assuming $\phi_i$ are uniform on their range. Is it correct ?

random $c$

enter image description here

1

There are 1 best solutions below

5
On

The claim that $y_0$ is distributed as $r \cos \theta$ with $\theta$ uniformly chosen from $[0,\pi]$ is only true for the circle (the sphere in $\mathbb R^2$) where we can choose an angle around the circle uniformly at random.

In higher dimensions, $\theta$ is much more likely to be close to $\frac{\pi}{2}$ than to $0$ or $\pi$, corresponding to the point being much more likely to be close to the equator $x_1 \approx 0$ than it is to one of the poles $x_1 \approx \pm 1$. Intuitively, for small $\epsilon > 0$, a region within $\epsilon$ of a pole is a ball of dimension one lower, so it has area $\mathcal O(\epsilon^{n-1})$, but a strip of radius $\epsilon$ around the equator has area $\mathcal O(\epsilon)$.

This paper suggests that for large $n$, if we take $r=\sqrt n$ to normalize the coordinates, $y_0$ approaches a normal distribution.

Fun fact: if we're in $3$ dimensions, the distribution of a coordinate of a random point on the sphere is uniform. Here are some histograms (I sampled $100000$ points for each.)

In $2$ dimensions (this should match your second histogram):

2d sphere histogram

In $3$ dimensions (this should be a sample from a uniform distribution):

3d sphere histogram

In $4$ dimensions:

4d sphere histogram

In $10$ dimensions (we approach the normal distribution):

10d sphere histogram


Moreover, generating random points on the sphere by sampling their spherical coordinates uniformly will not work: the way you know that is that when you integrate in spherical coordinates, you have to add an integration factor.

The standard way to generate a random point on a sphere in $\mathbb R^n$ is to choose $n$ independent standard normal variables, getting a vector $\mathbf W = (W_1, \dots, W_n)$, and then normalize: let $\mathbf x = r \frac{\mathbf W}{\|\mathbf W\|}$. This works because the multivariate normal distribution is rotationally symmetric.