I have a Monte Carlo simulation which, given an initial state, does some random stuff and outputs a scalar. Let this output be the random variable $Y$. The simulation takes place on an $K$x$K$ grid, such as below with $K=4$:
| (0, 0) | (0, 1) | (0, 2) | (0, 3) |
|---|---|---|---|
| (1, 0) | (1, 1) | (1, 2) | (1, 3) |
| (2, 0) | (2, 1) | (2, 2) | (2, 3) |
| (3, 0) | (3, 1) | (3, 2) | (3, 3) |
States are identified by their (row, col) indices starting from top left.
My goal in this setup is to estimate $E[Y]$ over all the states. I have some prior information about the simulation that says the variance in the value of $Y$ is larger for states that are closer to the bottom right corner. Therefore, when performing simulations to estimate $E[Y]$, I want to bias my sampling of initial states towards this bottom right corner.
Let the row and column indices of a state $s$ be accessed via the notation $s.r$ and $s.c$, respectively, and let $S$ be the entire set of states. I could construct a probability distribution to sample from that achieves my goal as follows, using the sum of the state row and column to bias sampling towards the bottom right:
$g(s) = \frac{s.r + s.c}{\sum_{s' \in S}{s'.r + s'.c}}$
This distribution $g$ is the biased distribution I use for importance sampling. The ''true'' distribution (let's call it $f$) is simply a uniform distribution over $S$.
Because I can't do an infinite number of simulations, I have a budget of $m$ simulations that I can perform to estimate $E[Y]$. At this point, I could just perform $m$ simulations, then perform the importance correction to give me my estimate.
But here's the extra complication! I also need to guarantee that each $s \in S$ is selected as the initial state at least once. By just doing pure importance sampling, I am at the mercy of the biased distribution $g$: it is possible that, in my $m$ trials, not all the states will be sampled from $g$.
So, I am currently performing a two step procedure for estimating $E[Y]$, and am not sure if it is correct, or if there are alternatives. Currently, I actually perform $n = \vert S \vert + m$ trials, composed of selecting each state in $S$ once (guaranteeing that every state is selected), then doing $m$ rounds of importance sampling.
When calculating $E[Y]$, I'm treating the first $\vert S \vert$ samples of $Y$ as having weight 1, and the other $m$ samples are given their importance corrected weights, i.e. $\frac{f(s)}{g(s)}$. Then I just take a weighted average of these results, i.e. for $i = 1 \ldots n$, where $Y_i$ and $w_i$ denote the $i^{th}$ output of the simulation, and weight of the output respectively:
$E[Y] = \frac{\sum_{i=1}^{n}{Y_i \cdot w_i}}{\sum_{i=1}^{n}{w_i}}$
In my head, I thought of it like I was actually estimating $E[Y]$ in two separate manners and then combining them in a weighted average, the first estimate has weight $\vert S \vert$, and the second estimate has a weighting equal to the sum of the $m$ importance correction weights. Is this thinking correct? If so, can you tell me why, i.e. appeal to relevant mathematical laws? If not, why not?