Here is a question I have recently been thinking about:
My Problem: Let $X$ be a random variable with Probability Distribution Function $f(x)$. On average, how many random values need to be simulated from $X$ such that a value of $X_i$ is encountered in which $ E(X) + \epsilon \leq X_i \geq E(X) - \epsilon$ occurs for the first time?
The fist thing that comes to mind is to try and solve this type of problem from simulation.
For example, below I wrote some R code in which random numbers are repeatedly generated from a Normal Distribution with $\mu = 5$ and $\sigma^2 = 5$ until a number is generated which is less than 5+0.01 and greater than 5-0.01 for the first time. This is then repeated many times and the results are visualized:
epsilon <- 0.01
sim_count <- function(epsilon) {
count <- 0
n <- rnorm(1, 5, 5)
while (n < 5 - epsilon || n > 5 + epsilon) {
n <- rnorm(1, 5, 5)
count <- count + 1
}
return(count)
}
results <- replicate(1000, sim_count(0.1))
df <- data.frame(results)
library(ggplot2)
mean_value <- mean(results)
ggplot(df, aes(x = results)) +
geom_histogram(binwidth = 10, fill = "skyblue", color = "black") +
labs(title = paste("Number of Random Simulations Required to Obtain X = 5 from N(5,5):",
"\nMean =", mean_value, "Simulations", sep = " "),
x = "Number of Samples", y = "Frequency") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5))
My Question: Although this approach appears to be working - is there an exact mathematical approach that can be used to solve this problem?
The first thing that comes to mind is "Mean Time to Absorption" for the Markov Chains - but I am only aware of this in the Discrete Case (whereas my problem seems to be a Continuous Case).
As such, I do not know how to use a mathematical approach to answer my original question. I wondering if there is some mathematical formula that can relate $n$ (number of average simulations) to $\epsilon$, $f(x)$ and $E(X)$.
Can someone please help me with this?
Thanks!
Notes:
Using logic, I am guessing that larger values of $\epsilon$ might be associated with smaller values of $n$ : This is because the $\epsilon$ threshold will be larger and more generous, allowing for the desired condition to be satisfied at an earlier point in time.
For example if $m>n$ : Then, if $ E(X) + m*\epsilon \leq X_i \geq E(X) - m*\epsilon$ is satisfied then $ E(X) + n*\epsilon \leq X_i \geq E(X) - n*\epsilon$ is automatically satisfied.
Depending on what number you are first waiting to observe: I also think that the relative positions of the "mean" ($E(X)$) and "variance" ($E(X^2) - E(X)^2$) of the distribution might accelerate or deaccelerate the number of simulations required to observe the first occurrence

For each $X_i$, the probability of success $$p=P(\vert X_i - \mathbb{E}X\vert<\epsilon)$$
can be computed via $$p=\int_{\mathbb{E}X-\epsilon}^{\mathbb{E}X+\epsilon}f(x)\mathrm{d}x.$$
Since the $X_i$ are i.i.d., the time until the first success is distributed $T\sim \text{Geometric}(p)$, thus $\mathbb{E}T =\frac{1}{p}.$