Warm-up in 3D
Let's place a sphere of radius $r$ at the origin and a smaller sphere of radius $\epsilon$ at $(0, 0, r)$ (i.e., on the intersection of the z-axis with the surface of the larger sphere).
We're interested in their intersection, so any points that jointly satisfy $$ x^2 + y^2 + z^2 = r^2, $$ and $$ x^2 + y^2 + (z-r)^2 = \epsilon^2. $$
I.e.: $$ z = r - \frac{\epsilon^2}{2r}, $$ and $$ x^2 = \epsilon^2 - \frac{\epsilon^4}{4r^2} - y^2. $$
Which suggests a way to sample points uniformly from the intersection: uniformly sample $y \in [-(\epsilon^2-\epsilon^4/4r^2), \epsilon^2 - \epsilon^4/4r^2]$, then compute the corresponding $x$ (sign given by a coin flip).
Generalization
How do we generalize this to higher dimensions, $D$? In a way where the sampling is guaranteed to be uniform over the $D-1$-dimensional intersection? What about when the smaller sphere is placed at an arbitrary point on the surface?
Bonus
I'm interested in this from the context of initializing a weight matrix for a neural network. My aim is to explore what small perturbations to the initial matrix do to the results of training.
In this case, we know that the components of the original matrix are generated i.i.d. from a uniform or normal distribution. I've modified the weight initialization scheme to guarantee that the weight matrix has a fixed norm (equal to the limiting value of Kaiming-He initialization for large matrices). Is there any way to guarantee in general that:
- These perturbations stay on the same hypersurface at a fixed distance (the problem above), and
- the components of the perturbed matrix follow from the same initialization distribution.
Ok here's the right answer.
The intersection of two $d$-dimensional hyperspheres, $S^d$, where one is centered on the other is a $d-1$ dimensional hypersphere, $S^{d-1}$.
So all you have to do is uniformly sample a point on $S^{d-1}$ then rotate it in the higher dimensional space, and shift it to the right spot.
Here's the example of $S^1$:
And here's the example of $S^2$:
Concretely, let $\vec r$ be the location of the center of the second hypersphere, which has radius $\epsilon$.
A little trig (take a look at the example of $S^1$), shows that the angle from $\vec r$ to the cone that goes through the intersection is
$$ \theta = \arccos \left(1 - \frac{\epsilon^2}{2 r^2}\right), $$
where $r = |\vec r|$.
To sample from the intersection, first we use the Muller method (thanks David) to sample a point on $S^{d-1}$ with radius $\epsilon'=r \cos \theta$. Then, we embed it in the higher-dimensional space by adding on coordinate at the end with value $0$.
Next, we have to transform this sphere to make it orthogonal to $\vec r$. For this we can use a Householder reflection to reflect the normal vector of the embedded $S^{d-1}$, $\vec z = (0, \dots, 0, 1)$ onto $\vec r$.
Finally, we translate this $S^{d-1}$ along the vector $\vec r$ so that it intersects both spheres. We have to shift it by $\vec r' = r' \hat r = r' \frac{\vec r}{r}$, where $$ r' = r \cos\theta. $$
Note:
If $\vec r$ is uniformly distributed over the hypersphere it occupies, and if we generate a perturbation uniformly over $S^{d-1}$, then our perturbed vectors will also be uniformly distributed over the original hypersphere.