Number of Simulations Required Before Some Condition is Met?

25 Views Asked by At

Here is a question I have recently been thinking about:

My Problem: Let $X$ be a random variable with Probability Distribution Function $f(x)$. On average, how many random values need to be simulated from $X$ such that a value of $X_i$ is encountered in which $ E(X) + \epsilon \leq X_i \geq E(X) - \epsilon$ occurs for the first time?

The fist thing that comes to mind is to try and solve this type of problem from simulation.

For example, below I wrote some R code in which random numbers are repeatedly generated from a Normal Distribution with $\mu = 5$ and $\sigma^2 = 5$ until a number is generated which is less than 5+0.01 and greater than 5-0.01 for the first time. This is then repeated many times and the results are visualized:

epsilon <- 0.01
sim_count <- function(epsilon) {
    count <- 0
    n <- rnorm(1, 5, 5)
    while (n < 5 - epsilon || n > 5 + epsilon) {
        n <- rnorm(1, 5, 5)
        count <- count + 1
    }
    return(count)
}

results <- replicate(1000, sim_count(0.1))

df <- data.frame(results)

library(ggplot2)


mean_value <- mean(results)

ggplot(df, aes(x = results)) +
  geom_histogram(binwidth = 10, fill = "skyblue", color = "black") +
  labs(title = paste("Number of Random Simulations Required to Obtain X = 5 from N(5,5):",
                     "\nMean =", mean_value, "Simulations", sep = " "), 
       x = "Number of Samples", y = "Frequency") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

enter image description here

My Question: Although this approach appears to be working - is there an exact mathematical approach that can be used to solve this problem?

The first thing that comes to mind is "Mean Time to Absorption" for the Markov Chains - but I am only aware of this in the Discrete Case (whereas my problem seems to be a Continuous Case).

As such, I do not know how to use a mathematical approach to answer my original question. I wondering if there is some mathematical formula that can relate $n$ (number of average simulations) to $\epsilon$, $f(x)$ and $E(X)$.

Can someone please help me with this?

Thanks!

Notes:

  • Using logic, I am guessing that larger values of $\epsilon$ might be associated with smaller values of $n$ : This is because the $\epsilon$ threshold will be larger and more generous, allowing for the desired condition to be satisfied at an earlier point in time.

  • For example if $m>n$ : Then, if $ E(X) + m*\epsilon \leq X_i \geq E(X) - m*\epsilon$ is satisfied then $ E(X) + n*\epsilon \leq X_i \geq E(X) - n*\epsilon$ is automatically satisfied.

  • Depending on what number you are first waiting to observe: I also think that the relative positions of the "mean" ($E(X)$) and "variance" ($E(X^2) - E(X)^2$) of the distribution might accelerate or deaccelerate the number of simulations required to observe the first occurrence

1

There are 1 best solutions below

6
On BEST ANSWER

For each $X_i$, the probability of success $$p=P(\vert X_i - \mathbb{E}X\vert<\epsilon)$$

can be computed via $$p=\int_{\mathbb{E}X-\epsilon}^{\mathbb{E}X+\epsilon}f(x)\mathrm{d}x.$$

Since the $X_i$ are i.i.d., the time until the first success is distributed $T\sim \text{Geometric}(p)$, thus $\mathbb{E}T =\frac{1}{p}.$