How to improve a poisson based estimator using variance reduction techniques

318 Views Asked by At

Given a random number $X \sim Pois(\mu)$ for some random, i.i.d $\mu$, I'm trying to estimate $P(X \ge x)$ by simulation. The approach is to use a raw/classic estimator, where I generate a bunch of $X$, by first generating an equally large number of $\mu$ and the counting the amount of $X \ge c$ finding the mean of $$I(x) = \left \{ \begin{array} . x < c, & 0\\ x \ge c, & 1 \end{array} \right . $$

$\mu$ is a random number with a density function on the form of $f(x) = \frac{c_1}{c_2 + g(x)}$ for $x \in (0, 1)$, where $g(x)$ is a monotone, decreasing, non-linear function.

I'm trying to find out how to improve the estimator using a few different techniques, namely importance sampling, conditioning, control variate and antithetic variables, but I'm having little luck in fully grasping how to do this.

Since $P(X \ge c)$ is highly unlikely to occur, I have had to generate a huge amount of random values to get an estimate, and have estimating the value using more than $10^{9}$ samples, where the results always gave (roughly) the same estimate, in the order of $10^{-6}$, with an identical variance.

I haven't gotten around to do importance sampling yet, so just ignore that for the moment (I'm not asking for the solution, just pointers to figure out what I'm doing wrong).

Conditioning

Since it's possible to calcuate $P(X \ge c)$ for a known $\mu$, I've tried doing that directly for a set of $\mu$; this yields the same estimate as the raw estimator and greatly improveces the variance to an order of $10^{-12}$, from the raw $10^{-6}$. This feels a bit like cheating though and I'm not entirely convinced that this is actually using conditioning.

Failing to use control variates

Like with antithetic variables, I haven't managed to successfully implement control variates. I've tried using two approaches:

  • Generating $U_1 \sim U(0,1)$ to calcuate the means $\mu$ based on $U_1$ and then generate a sample of $X$ based on the means. I then generate $U_2$ and use that to generate $Y$ distributed on $N(0,1)$ with a standard deviation based on $U_2$.
  • As above, but instead of using $U_2$ to generate $Y$, use $U_1$
  • As both of the above, but swapping $N$ with some other distribution.

The estimate is then based on $X + c (Y - E[Y])$ with $c = - \frac{cov(X,Y)}{var(Y)}$. Using these methods, I either get an estimate that's identical to the raw estimator but no improvement in variance, or I get something that's in the order of $10^{-4}$ or $10^{-12}$.

I need to use this on both the conditioned estimator and the raw estimator.

Failing to use antithetic variables

My approach for using antithetic variable is to generate $U \sim U(0,1)$ and then calculate $Y_1$, using $U$ and $Y_2$ using $1 - U$ -- i.e. $Y_1$ and $Y_2$ are clearly negatively correlated and both distributed on $U(0,1)$. I've then tried to approaches to get an estimate:

  • Generate $X_1$ using $Y_1$ and $X_2$ using $Y_2$ and finding the mean of the estimators based on $X_1$ and $X_2$
  • Generate $X_1$ using $\frac{Y_1 + Y_2}{2}$ and use $X_1$ to find the estimate.

Using both methods I always get an estimator which is only $\frac{3}{5}$ of what the raw estimator gives me. Clearly not correct.

This only needs to be done for the conditioned estimator and not for the raw estimator.