Numerical evaluation of expectation

609 Views Asked by At

I have a function $f:= \mathbb{R}^{20} \to \mathbb{R}$ and a $20$ dimensional Gaussian random variable $X \sim \mathcal{N}(\mu,\Sigma)$.

I want to evaluate $\mathbb{E}\left(f(X)\right)$.

There is no closed form solution of this integral so one possible method is to generate random Gaussian variables $X_1,X_2,...X_k$ and approximate expectation with $\frac{1}{k}\sum_{i=1}^kf(X_i)$

The problem is that due to some reasons I cant make $k$ very large, something around $500$ and I think that it is quite small for 20 dimensions.

My question: Is there some smart way of choosing or generating randomly these 500 random variables?

I guess one needs to choose variables tailored specifically for function $f$. Lets assume that in my case we have $f(y) = \Pi_{i=1}^{20}y_i$

Could you please demonstrate how I should choose points in this case? I have skimmed through couple of books on quasi Monte Carlo and authors write about low discrepancy sequences, stratified sampling, etc. but I haven't seen a single example of practical application of this theory on a function like in my question.

1

There are 1 best solutions below

0
On

Finding the "Best" points for numerical integration is a difficult problem. This is evidenced by the fact that this has been studied since the Babylonians and until now no general rule for all types of problems is available.

I will provide example code for five common and straightforward strategies which can be applied "out of the box" using the target function provided by you. One word of warning: Unless your problem function is indeed a true representative of the example you gave, the results you will achieve by these methods might be much worse. More sophisticated techniques exist, but these are highly problem specific and require substantial investment of time and effort. So much so, that in a professional context often the decision is to invest in better hardware instead.

I assume standard normals, i.e. mean zero, variance one and uncorrelated, since this is the most difficult case for the estimation and the easiest from a theoretical perspective. For example, now it is easy to see that the true value of the expectation of your target function is zero.

Each of the methods is randomised. To compare performance consistently and measure the variance of the estimate I apply each method to 1000 iid runs, each on a sample of size of 500. This gives me 1000 estimates of the expected value and I report the standard deviation of this estimate. In addition I show the overall mean. This is only interesting to detect biased estimates and otherwise useless.

The five methods

  1. MC: This is plain vanilla Monte Carlo and serves as base case.
  2. Halton: These are quasi-random numbers. (See halton)
  3. Anti: Variance reduction by antithetic variables. (See anti) This is quite a common technique for random walks.
  4. LHC: Latin Hypercube design. This is an example of a space-filling design. (See lhc)
  5. Whitened: This transforms the random sample to make sure first and second empirical moments are identical to their theoretical values. Relies on the Cholesky factorisation (See chol)

The code

import numpy as np
import pandas as pd
import scipy.special
import diversipy

nsim = 500
nd = 20
nrun = 1000

#This is the target function given by the OP
def fun_target(x):
    return x.prod(axis=1)

#Run-Loop for all designs
valsammler = []
for irun in range(nrun):
    if irun % 10 == 0:
        print(irun)

    #define designs
    # 1) Naive Monte Carlo
    des_mc = np.random.normal(size=(nsim,nd))
    # 2) Halton design
    des_halton = scipy.special.ndtri(
        diversipy.hycusampling.halton(nsim, nd, skip=np.random.randint(1,50000)))
    # 3) Anti-thetic variables
    des_anti = np.r_[des_mc[:(nsim//2),:], -des_mc[:(nsim//2),:] ]
    # 4) Latin Hyper Cube
    des_lhc = scipy.special.ndtri(
            (diversipy.hycusampling.lhd_matrix(nsim, nd) + 1)/(nsim + 1))
    # 5) Whitened normals
    X = des_mc
    #Centre the sample, i.e. remove deviations from the theoretical mean
    X-=X.mean(axis=0)
    #calculate cholesky factorisation of the empirical covariance
    L = np.linalg.cholesky(np.cov(X, rowvar=False, ddof=1))
    #remove deviations from the correlations
    des_white = np.linalg.solve(L,X.T)

    # get estimate of target's expectation
    val_halton = np.mean(fun_target(des_halton))
    val_mc = np.mean(fun_target(des_mc))
    val_anti = np.mean(fun_target(des_anti))    
    val_lhc = np.mean(fun_target(des_lhc))    
    val_white = np.mean(fun_target(des_white)) 

    valsammler.append([val_mc, val_halton, val_anti, val_lhc, val_white] )

tt = np.array(valsammler)
m = np.mean(tt, axis=0)
s = np.std(tt, axis=0)
restbl = pd.DataFrame({'mean': m, 'stdev': s}, 
                      index=['MC', 'Halton', 'Anti', 'LHC', 'White'])
print(restbl)

Results and discussion

My runs produced this output

                 mean          stdev
MC      -2.610587e-03   1.482639e-01
Halton   5.249188e-04   8.528158e-03
Anti    -7.709898e-03   2.985108e-01
LHC      1.455367e-04   1.564775e-02
White   7.125519e-112  1.704653e-110

As is to be expected given the target function the 'White' method performs stellar, while antithetic variates 'anti' are worst. They are even worse than 'MC'. Notice that there are orders of magnitudes i.e. huge differences between the methods. I was surprised by the good performance of 'halton'. In other experiments quasi-Monte Carlo did not work well for me in such a high dimensional setting. The performance of 'LHC' is more in line with my expectations.

To translate this into sample sizes: To reduce the standard deviation of plain vanilla MC $n$-fold you need $n^2$ times more samples. So reduction of stdev from 1.45e-01 (MC) to 1.56e-02 (LHC) is roughly a factor 100 in the samples. The reduction to Halton amounts to roughly 700.