When working on variance reduction techniques, I was studying stratified sampling.
Suppose we wanted to estimate a definite integral, and we decided to do so using classical Monte Carlo.
It can be shown that stratified sampling reduces the overall variance of our estimator, but I don't see intuitively why this is true.
In classical Monte Carlo, we sample points from the function, and then take the average.
In stratified sampling, we partition the interval into strata, collect samples from each stratum, and then combine our results.
So my question is how does stratified sampling reduce variance? I can kind of see the variance being smaller on a particular stratum, but I don't see how the sum of these estimates yields an overall lower variance.
It forces a certain degree of non-clumping of the points that you actually use for quadrature. For example, suppose you are integrating $f(x)=x$ on $[-1,1]$. With regular Monte Carlo, it is possible that all your integration points will be in $[0,1]$ (this has probability $2^{-n}$ of course but still). In these cases your quadrature result will be significantly larger than the desired result of $0$, which contributes some variance. But if you take strata on $[-1,0]$ and $[0,1]$ for example, then at least the two strata themselves (in this example, the negative and positive values of the integrand) will be equally represented.