Approximating Volumes in high dimensions using Monte Carlo

894 Views Asked by At

I am interested in approximating the volume of a subset $S$ of a hypercube in $d$ dimensions through Monte Carlo. The idea would then be that I have a large number of uniformly distributed samples in the hypercube and find the proportion of samples that lie in $S$.

My question is then: suppose that I use 10000 samples in $[0,1]^2$ to approximate the volume of $S_1 \in [0,1]^2$.

Would I get the same level of accuracy if I were to use 10000 samples in $[0,1]^3$ to approximate the volume of $S_2 \in [0,1]^3$?

I'm asking because intuitively, as I increase the dimension, it seems to me that more samples in the hypercube are needed to "fill in the space."

But if you think of it in numbers, computing the density of samples in both cases, it turns out that they're the same since the volume of both hypercubes is 1, regardless of the dimension.

Am I thinking about it from the wrong perspective? Should more samples be used in higher dimensions?

1

There are 1 best solutions below

6
On

Although LutzL concisely referred to the key points in a comment to the question, I thought opening up the explanation a bit. (I believe it is useful to know rough idea and strengths of a method even when you do not immediately need to apply the knowledge, because when you have lots of tools at your disposal, you can choose the most appropriate one to apply. Not having to always resort to a cheap hammer is a good thing, in my opinion.)

suppose that I use 10000 samples in $[0,1]^2$ to approximate the volume of $S_1 \in [0,1]^2$. Would I get the same level of accuracy if I were to use 10000 samples in $[0,1]^3$ to approximate the volume of $S_2 \in [0,1]^3$?

Yes, but only because both are unit volumes (i.e., $1^D = 1$ for all $D \in \mathbb{Z}$, number of dimensions $D \ge 1$).

The Wikipedia page on Monte Carlo integration has a good summary. In short, the error estimate of the volume estimate $Q$ obtained using simple Monte Carlo integration is $$\delta Q \approx V \frac{\delta_N}{\sqrt{N}}$$ i.e. standard error of the mean multiplied by the volume $V$. As mentioned on the Wikipedia page, it is important to realize that this is not a strict error bound, and the actual error may be larger.

Although the Wikipedia page says that the naive Monte Carlo integration is rarely useful, it's not backed by any facts. I've found it to be quite useful in a large number of cases myself.

True, you do not normally wish to sample the entire volume, especially for large volumes; you want to omit the simple subvolumes you know are completely inside or outside, so you only sample the subvolumes you are uncertain of. For example, if you have a shape that is mostly spherical (at average radius $R$) but contains "fuzz" or surface detail or other deviation from the spherical shape that you know is limited to distances $R-r$ to $R+r$ from the center, you would definitely just sample the spherical shell at $R-r$ to $R+r$. (The volume is then the estimated volume of the spherical shell, plus the volume of the assumed $R-r$-radius sphere.) While this may technically fall under importance sampling, I personally think of it as just simple Monte Carlo integration of the interesting subvolumes.

So, based on how I interpret your question, and the last statement,

Am I thinking about it from the wrong perspective? Should more samples be used in higher dimensions?

I'd say you are thinking about it from a valid perspective.

The number of samples gives you a rough relative error estimate of the volume estimate, $1/\sqrt{N}$; $10,000$ samples should give you about two digits of precision in the volume estimate. So, if you need a specific order of magnitude absolute error estimate, then you do need to consider the total sampled volume, and that may be surprisingly huge for large number of dimensions. (Although you could say that is just a limitation in human perception, I guess.)

I'd say your suspicions about higher dimensions are also on the right track, if your suspicions about high number of dimensions is related to the fact that our intuition about the total volume by just looking at the coordinate range involved is only valid for 1, 2, or 3 dimensions, you're on the right track. There is nothing weird about high number of dimensions, except that our intuition does not work, so we must be careful and check -- here, calculate an initial rough estimate, as in order of magnitude, of the volume --, instead of just assuming things we might feel obvious or familiar.

So, be suspicious of assumptions, yes. But, there is no reason to fear or be suspicious about high number of dimension; just be careful and meticulous, so you don't accidentally make stupid assumptions (that might be quite sane and safe in low number of dimensions).