What is the purpose of Monte Carlo simulation

1.1k Views Asked by At

If I perform a Monte Carlo simulation of a discrete random variable, I will get a list of results in a proportion that closely matches the probabilities of the discrete random variable e.g.

X is a discrete random variable with states: $P(X=0)=0.3, P(X=1)=0.3, P(X=2)=0.4$. If I perform a Monte Carlo simulation, I might get: $X = 2, 1, 1, 0, 2, 0, 2, 1, 0, 0, 1, 2, 0, 2, 2$.

I would then conclude that I am likely to get a result of X=0 five fifteenths of the time, X=1 four fifteenths of the time, and X=2 six fifteenths of the time. These fractions are all very close to the probabilities of the discrete random variable $(5/15 = 0.33 ≈ 0.3, 4/15 = 0.27 ≈ 0.3, 6/15 = 0.4 = 0.4)$.

My question is, what is the point of performing the simulation in the first place? I already knew that I'd get X=0 about 30% of the time, x=1 about 30% of the time, and x=2 about 40% of the time, just by looking at the discrete random variable.

3

There are 3 best solutions below

0
On BEST ANSWER

Monte Carlo simulation is for approximating values that are not known or easily computed. Although an outcome may be dependent on events whose exact frequency is known, the frequency of outcomes may not be.

An early example is Buffan"s needles, using the toss of needles on a ruled plane with lines spaced equally as the needles' length. The counting of needles that cross a line yields a rational approximation of $\pi$. The events driving the trial can have uniform distribution, but the outcome is not so easily predicted.

0
On

I sometimes use simulation in a situation where an analytic solution is tedious or difficult. Sometimes to check my integration or combinatorics. The first two examples below are specific; in those instances with a million iterations, the second decimal place should be accurate, possibly the third. (Results are from R statistical software.)

$1.$ What is the probability of getting a total of 10 when a standard die is rolled five times? (Exact combinatorial and approximate normal answers are available.)

x = replicate( 10^6, sum(sample(1:6, 5, repl=T)) )
mean(x==10)
## 0.016221

$2.$ If $Z$ is standard normal and $X = |Z|$, then what is $SD(X)?$ (An exact result by integration is possible: $\sqrt{1 - \frac{2}{\pi}}= 0.6028103.$)

sd(abs(rnorm(10^6)))
## 0.6027942

$3.$ If there are 30 randomly chosen people in a room, what is the probability there are no two people with the same birthday? Famous problem, exact solution not difficult, provided we assume 365 equally likely birthdays. But in real life birthdays are not equally likely. In the US there are relatively more births in summer months and December (about $\pm 7\%$). Exact solution difficult. Easy to simulate if you have data on actual probabilities in the population. (For the US population, the answer changes by a digit in the second decimal place.)

$4.$ Among normal samples of size 50, what percentage will show outliers in a boxplot? (I know of no way to get an exact analytic answer, but simulation is not difficult.)

$5.$ In some Bayesian statistical analyses, the prior distribution and the data are known, but the posterior distribution is not tractable. Then some simulation method such as Gibbs Sampling or use of the Metropolis-Hastings algorithm is necessary.

$6.$ Some statistical methods rely inherently on simulation. Bootstrap confidence intervals, and permutation tests are examples. (For simple data and permutation tests, it may be possible to find the exact permutation distribution under the null hypothesis.)

$7.$ Partial differential equation models in weather forecasting are frequently solved by simulation.

$8.$ In consulting work sometimes a good approximation via simulation by noon today is of much more value than an exact analytic answer by noon tomorrow.

However, simulations need to be based on specific parameters. So you don't get general formulas. For publication in the mathematical sciences, I do not think that any results should be based on simulation, if an analytic solution is possible. When simulation is the only way, the margin of simulation error needs to be reported; and the software and seeds for the pseudorandom number generator should be supplied so readers can replicate the simulation.

Soon we will have quantum computing. Some papers I have read indicate that (at a fundamental level) all answers based on quantum computing methods will be simulations.

0
On

Your example is not really a situation where someone would use Monte Carlo simulation. We already know the probability distribution of $X$, so if we are going to use Monty Carlo to find the distribution of something, it certainly wouldn't be $X$... But what about something related to $X$?

Consider $X$ to be a "source of randomness" in the evolution of the following quantity $Y(t)$, $$Y(t+1) = 0.5Y(t) + \sin(X(t))$$

I tell you that $Y(0) = 4$ and ask "what is the probability that $Y(10)$ is between $-1$ and $2.7$?" You might want to simulate this by drawing samples of $X$ and iterating $Y(t)$ from $t=0$ to $t=10$, then write down $Y(10)$. Do this 1000 times and that can be your distribution for $Y(10)$ given $Y(0) = 4$.

It doesn't have to be a difference (or differential) equation though, it could also just be a simple algebraic one and we still might want to use Monte Carlo. However, in the discrete state case, for a simple algebraic equation like $Y=X^2$ it is easy to write down all possible outcomes and directly map the probabilities of states of $X$ to those of states of $Y$. So let's consider instead that $X$ is continuous and normally distributed with mean $\mu$ and variance $\sigma^2$. What is the probability that $Y$ is between $-3$ and $4.5$ if $Y=\sin(X)\cos(X)$? Again, Monte Carlo would be a decent way to attack the problem if you have a computer available.