Uniformly Sampling from a High-Dimensional Unit Sphere

887 Views Asked by At

This Question is supposed to gather techniques on how to uniformly sample points on $\mathbb S^{d-1}$ for large $d$. There are a few things to keep in mind for this problem, mainly as the dimension $d$ increases:

  1. The computation of the norm of vectors in $\mathbb R^d$ becomes increasingly expensive. Hence methods relying on normalization of vectors become less efficient.
  2. The volume of the $d$-ball in relation to larger sets containing the $d$-ball decreases. As a consequence, acceptance-rejection sampling methods become less efficient as we reject too many samples.

To give two examples of "naive" methods one might try, which do work well in low dimensions but not in large dimensions, consider the following:

Normalizing a symmetric distribution:

First, sample a random vector $X\in\mathbb R^d$ from a distribution which is invariant under symmetry transformations of $\mathbb S^{d-1}$, then normalize $X$, i.e. compute $X/\Vert X\Vert$. This normalized vector will be uniformly distributed on $\mathbb S^{d-1}$. This method suffers from Problem 1.

Rejecting points from the unit cube:

Sample a random vector $X$ uniformly in $[-1,1]^d$. If the random vector lies inside the $d$-ball, keep the sample, if not, discard the sample. This will generate a uniform distribution on the $d$-ball. A uniform distribution on $\mathbb S^{d-1}$ can then be achieved by normalizing the samples. This acceptance-rejection method suffers from Problem 2, since as $d$ increases, fewer points in the unit cube are located in the unit $d$-ball and hence we will reject a lot of samples. (This method also suffers from Problem 1, but that's beside the point.)

Question:

What are sampling techniques that achieve a uniform distribution on $\mathbb S^{d-1}$ but avoid Problems 1 and 2 for large $d$?

2

There are 2 best solutions below

6
On

The best solution is to pick the integer N such that 2N = d or 2N = d+1, and then to use standard algorithms to pick a standard normal random variable from the plane N times independently. This yields a standard normal choice in d-dimensional euclidean space, which divided by its length will be a uniformly chosen point on S^(d-1).

2
On

I agree with Dan Asimov that sampling a $d$-dimension standard normal and then renormalizing is the best, even if the original poster has ruled it out a priori.

Here is a different method which produces the coordinates $x_i$ sequentially and never re-examines or re-calculates them. Let $B_n$ denote the distribution of the first coordinate of a random point in $S^{n-1}$: it is a beta distribution on $[-1,1]$. (I think the $B_n$ distribution is that of $\pm \sqrt Y$ when $Y\sim\operatorname{Beta}(1,n-1)$. Special cases are: $B_3$ is uniform distribution on $[-1,1]$, and $B_1$ is uniform on $\{\pm1\}$.) Here is the method. Pick $x_1$ according to $B_d$. Pick $x_2$ by multiplying a $B_{d-1}$ variate by $\sqrt{1-x_1^2}$. Pick $x_{i+1}$ by multiplying a $B_{d-i}$ variate by $\sqrt{1-(x_1^2+\cdots x_i^2)}$. (One keeps track of the sums $x_1^2+\cdots x_i^2$ by updating with the square of the new coordinate $x_{i+1}^2$ and so on.)

Of course this is silly because of the cost of generating the $B_n$ variates and (as David K points out in a comment) extracting the square roots is surely too high. But it does produce the desired coordinates in a "on-line" fashion, not needing to look back at old coordinates.