I don't understand why we need to * (b-a). Here b-a = 4-2=2.I think we already make random x from 2 to 4. We've already considered the interval (not 0,1). Why do we times (b-a) again?
Simple Monte Carlo integration about b-a
106 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
Maybe comparing Monte Carlo integration with Riemann approximation will help you understand the former.
Suppose we want to approximate the integral $J = \int_a^b e^{-x}\,dx$ by $n=1000$ rectangles of widths $w$ and heights $h = e^{-m},$ where $m$ is the midpoint of the base of a rectangle. In R,
a = 2; b = 4
n = 1000
w = (b-a)/n
m = seq(a+w/2, b-w/2, length=n)
h = exp(-m)
sum(w*h)
[1] 0.1170196
This is a deterministic computation, so each run will give the same answer.
You are integrating the density function of the distribution $\mathsf{Exp}(\mathrm{rate\,}=\lambda = 1).$ In R, the CDF of
this distribution is pexp (with default $\lambda=1).$ So a Riemann approximation with 1000 evenly spaced grid points as centers of rectangles has given us four-place accuracy.
diff(pexp(c(2,4)))
[1] 0.1170196
A basic Monte Carlo integration substitutes for the evenly spaced points above, randomly chosen $m$ distributed as $\mathsf{Unif}(a,b).$ For this one-dimensional integration, randomly chosen grid points are not quite as efficient as a precisely evenly spaced grid. So I will use $n = 10\,000$ random points (uniformly distributed).
set.seed(2021)
a = 2; b = 4
n = 10000
w = (b-a)/n
m = runif(m, a, b)
h = exp(-m)
sum(w*h)
[1] 0.01189166
This is a random procedure, so (for different seeds, or no set seed) the result will be slightly different on each run.
The average widths are $w = (b-a)/n.$ If I had used mean instead of
sum as you did, that would take care of the $1/n,$ but the
length of the interval, over which the integration takes place, is still needed.
Note: Sampling method. For Monte Carlo integration in one dimension, a 'sampling
method' is sometimes used (often more for its ease of programming than for its efficiently). If we use the R function rexp to sample a million observations from $\mathsf{Exp}(\lambda=1),$ then we can ask what proportion of them
lie between $2$ and $4.$
set.seed(120)
x = rexp(10^6)
mean((x>2)&(x<4))
[1] 0.117494
The vector (x>2)&(x<4) is a logical vector with a million
elements TRUE or FALSE; the mean of a logical vector is its proportion of TRUEs.
This method is especially convenient when it would be some trouble to find the density function of the distribution of interest. Suppose time to completion of a 2-phase process is the sum of $Z \sim \mathsf{Norm}(\mu=30,\sigma = 5)$ and $X \sim \mathsf{Exp}(\lambda = .05).$ The the waiting time to completion is $W = Z+X.$ The average waiting time $E(W) = E(Z)+E(X) = 30+20 = 50,$ and $P(W > 60) \approx 0.22935 \pm 0.00084.$
set.seed(109)
z = rnorm(10^6, 30,5)
x = rexp(10^6, .05)
w = z + x
mean(w > 60)
[1] 0.229349
2*sd(w > 60)/1000
[1] 0.0008408287
hist(w, prob=T, br=60, col="skyblue2",
main="Time to Completion")
abline(v = 60, col="red", lwd=2)
Note: Higher dimensions. The accuracy of a basic Monte Carlo integration depends in part on the "wiggliness" of the function $f(x)$ being integrated. However, generally speaking, Monte Carlo integrations in two (or higher) dimensions with $n$ randomly chosen points are about as good as (or better than) a Riemann approximation with a grid of $n$ evenly spaced points.
Example: The integral of a bivariate standard normal distribution (correlation $0)$ over first quadrant of the unit circle can be shown to be $J = 0.0984.$
pchisq(1,2)/4
[1] 0.09836734
We show Monte Carlo integration with about $100^2\pi/4$ points randomly distributed in the triangle. Notice we average the random heights and multiply by the area of the quarter circle.
set.seed(1234) # 1234
n = 100; u1 = runif(n); u2 = runif(n)
h.sq = dnorm(u1)*dnorm(u2)
h.qd = h.sq[u1^2 + u2^2 < 1]
(pi/4)*mean(h.qd)
[1] 0.09813584



They are approaching this integration by approximating the average value of $e^{-x}$ over $[2,4]$ (I.e., the average y value) and multiplying it by the length of the x interval to get the estimate of the integral.
More precisely:
$$ \bar{f}_{[a,b]}= \frac{\int_a^b f dx}{b-a}$$
The code
mean(exp(-x))is estimating $\bar{f}_{[a,b]}$ so you need to multiply by the “base” of the rectangle you are making.