I have a function $f:= \mathbb{R}^{20} \to \mathbb{R}$ and a $20$ dimensional Gaussian random variable $X \sim \mathcal{N}(\mu,\Sigma)$.
I want to evaluate $\mathbb{E}\left(f(X)\right)$.
There is no closed form solution of this integral so one possible method is to generate random Gaussian variables $X_1,X_2,...X_k$ and approximate expectation with $\frac{1}{k}\sum_{i=1}^kf(X_i)$
The problem is that due to some reasons I cant make $k$ very large, something around $500$ and I think that it is quite small for 20 dimensions.
My question: Is there some smart way of choosing or generating randomly these 500 random variables?
I guess one needs to choose variables tailored specifically for function $f$. Lets assume that in my case we have $f(y) = \Pi_{i=1}^{20}y_i$
Could you please demonstrate how I should choose points in this case? I have skimmed through couple of books on quasi Monte Carlo and authors write about low discrepancy sequences, stratified sampling, etc. but I haven't seen a single example of practical application of this theory on a function like in my question.
Finding the "Best" points for numerical integration is a difficult problem. This is evidenced by the fact that this has been studied since the Babylonians and until now no general rule for all types of problems is available.
I will provide example code for five common and straightforward strategies which can be applied "out of the box" using the target function provided by you. One word of warning: Unless your problem function is indeed a true representative of the example you gave, the results you will achieve by these methods might be much worse. More sophisticated techniques exist, but these are highly problem specific and require substantial investment of time and effort. So much so, that in a professional context often the decision is to invest in better hardware instead.
I assume standard normals, i.e. mean zero, variance one and uncorrelated, since this is the most difficult case for the estimation and the easiest from a theoretical perspective. For example, now it is easy to see that the true value of the expectation of your target function is zero.
Each of the methods is randomised. To compare performance consistently and measure the variance of the estimate I apply each method to 1000 iid runs, each on a sample of size of 500. This gives me 1000 estimates of the expected value and I report the standard deviation of this estimate. In addition I show the overall mean. This is only interesting to detect biased estimates and otherwise useless.
The five methods
The code
Results and discussion
My runs produced this output
As is to be expected given the target function the 'White' method performs stellar, while antithetic variates 'anti' are worst. They are even worse than 'MC'. Notice that there are orders of magnitudes i.e. huge differences between the methods. I was surprised by the good performance of 'halton'. In other experiments quasi-Monte Carlo did not work well for me in such a high dimensional setting. The performance of 'LHC' is more in line with my expectations.
To translate this into sample sizes: To reduce the standard deviation of plain vanilla MC $n$-fold you need $n^2$ times more samples. So reduction of stdev from 1.45e-01 (MC) to 1.56e-02 (LHC) is roughly a factor 100 in the samples. The reduction to Halton amounts to roughly 700.