Combining importance sampling with enumeration for estimating expected value

51 Views Asked by At

I have a Monte Carlo simulation which, given an initial state, does some random stuff and outputs a scalar. Let this output be the random variable $Y$. The simulation takes place on an $K$x$K$ grid, such as below with $K=4$:

(0, 0) (0, 1) (0, 2) (0, 3)
(1, 0) (1, 1) (1, 2) (1, 3)
(2, 0) (2, 1) (2, 2) (2, 3)
(3, 0) (3, 1) (3, 2) (3, 3)

States are identified by their (row, col) indices starting from top left.

My goal in this setup is to estimate $E[Y]$ over all the states. I have some prior information about the simulation that says the variance in the value of $Y$ is larger for states that are closer to the bottom right corner. Therefore, when performing simulations to estimate $E[Y]$, I want to bias my sampling of initial states towards this bottom right corner.

Let the row and column indices of a state $s$ be accessed via the notation $s.r$ and $s.c$, respectively, and let $S$ be the entire set of states. I could construct a probability distribution to sample from that achieves my goal as follows, using the sum of the state row and column to bias sampling towards the bottom right:

$g(s) = \frac{s.r + s.c}{\sum_{s' \in S}{s'.r + s'.c}}$

This distribution $g$ is the biased distribution I use for importance sampling. The ''true'' distribution (let's call it $f$) is simply a uniform distribution over $S$.

Because I can't do an infinite number of simulations, I have a budget of $m$ simulations that I can perform to estimate $E[Y]$. At this point, I could just perform $m$ simulations, then perform the importance correction to give me my estimate.

But here's the extra complication! I also need to guarantee that each $s \in S$ is selected as the initial state at least once. By just doing pure importance sampling, I am at the mercy of the biased distribution $g$: it is possible that, in my $m$ trials, not all the states will be sampled from $g$.

So, I am currently performing a two step procedure for estimating $E[Y]$, and am not sure if it is correct, or if there are alternatives. Currently, I actually perform $n = \vert S \vert + m$ trials, composed of selecting each state in $S$ once (guaranteeing that every state is selected), then doing $m$ rounds of importance sampling.

When calculating $E[Y]$, I'm treating the first $\vert S \vert$ samples of $Y$ as having weight 1, and the other $m$ samples are given their importance corrected weights, i.e. $\frac{f(s)}{g(s)}$. Then I just take a weighted average of these results, i.e. for $i = 1 \ldots n$, where $Y_i$ and $w_i$ denote the $i^{th}$ output of the simulation, and weight of the output respectively:

$E[Y] = \frac{\sum_{i=1}^{n}{Y_i \cdot w_i}}{\sum_{i=1}^{n}{w_i}}$

In my head, I thought of it like I was actually estimating $E[Y]$ in two separate manners and then combining them in a weighted average, the first estimate has weight $\vert S \vert$, and the second estimate has a weighting equal to the sum of the $m$ importance correction weights. Is this thinking correct? If so, can you tell me why, i.e. appeal to relevant mathematical laws? If not, why not?