Probability that after 10,000 steps (+-1) you'll end up at the origin. How to use Central Limit Theorem?

1.4k Views Asked by At

Starting at the origin and taking one step left or right with equal probability, what is the probability that you'll end up at 0 after 10,000 steps?

I figured it'd be $\frac1{2^{10000}}\binom{10000}{5000}$ since you will be taking half of the steps in one direction and half in the other in no particular order and then divide the number of all possible paths that land you at 0 by the total possible number of paths.

I got probability of about 0.008.

But how do I get this result using central limit theorem?

3

There are 3 best solutions below

20
On BEST ANSWER

We shall consider Bernoulli random variables $X_i$ that take on the value $1$ with probability 1/2 corresponding to a step to the right at step i, and the value $0$ with probability 1/2 corresponding to a step to the left at step i. Define

$$Y = \sum_{i=1}^{1000}X_i.$$

We wish to approximate

$$P(Y = 5000) = {10000 \choose 5000}\frac{1}{2^{10000}}$$

the probability that we end up at $0$ after 10000 steps. We shall use the DeMoivre-Laplace limit theorem (Feller $VII.3$ $3.16$) which is generalized by the central limit theorem (Feller $X.1$) to approximate this central binomial probability as

$$P(Y=5000) = P\left(-0.5 \le \frac{Y - 5000}{\sqrt{2500}} \le 0.5 \right)$$

$$\approx \Phi(0.5/\sqrt{2500}) - \Phi(-0.5/\sqrt{2500}) \approx 0.00797871$$

where $\Phi(x)$ is the cumulative standard normal distribution. The 2500 came from the variance of the binomial distribution npq where n = 10000, p=q=1/2. We are in effect approximating the binomial distribution of Y with mean np and variance npq with a normal distribution of the same mean and variance. Then we are normalizing Y so that we may use the standard cumulative normal distribution. The factors of +/- 0.5 come about not from CLT, but from the fact that we are approximating the discrete binomial distribution with a continuous normal distribution. This is done such that the interval between -0.5 and +0.5 on the normal will correspond naturally to $0$ on the binomial.

Compare this numerical result to the exact answer which to 8 decimal places is 0.00797865. This is slightly more accurate than the result of the DeMoivre-Laplace theorem based on the mass function used in my other answer as well as the answer of D. Thomine which gives 0.00797885. In fact, we may verify that for even n from 2 to several billion, the method in this answer is more accurate than that one, with the biggest differences coming at small n. By performing series expansions, one can verify that this will be the case for all even n.

Note that some sources refer to this as the DeMoivre-Laplace central limit theorem.

Now an objection has been raised that says we can't use CLT to approximate this probability. Approximating a probability by CLT means that we identify a probability which CLT says will converge as $n\rightarrow \infty$, and then we approximate it for some finite n by the value CLT gives as the limit. The approximation will be good to the extent that the probability has converged by that value of n. It is also possible to bound the error with something like the Berry-Esseen theorem if desired, and for Bernoulli random variables, the error decreases ~$1/\sqrt{n}$.

For our case, CLT takes the form of the DeMoivre-Laplace limit theorem for Bernoulli random variable which states

$$P\left(a\sqrt{npq} \le Y_n-np \le b\sqrt{npq}\right)\rightarrow \Phi(b)-\Phi(a)$$

with

$$Y_n = \sum_{i=1}^nX_i$$

and $p=q=1/2$. Now the objection concerns the fact that a and b must be constants so that the interval grows as $\sqrt{n}$. We are in fact setting a and b to constants, specifically $a=-0.5/\sqrt{2500}$, and $b = +0.5/\sqrt{2500}$, not $\pm0.5/\sqrt{npq}$. CLT then says that the probability that the walker is within $0.5/\sqrt{2500}$ standard deviations of $0$ converges as the RHS above. The question asked about that probability for $n=10000$. We are using CLT to approximate that probability, and the approximation will be as good as the convergence of that probability by $n=10000$. But note that this probability does NOT correspond to the probability that the walker will be at $0$, that is $P(Y_n=np)$, for all n. It only coincides with that probability for n around 10000. But that's OK because we weren't asked to provide $P(Y_n=np)$ and show it converges by CLT. We were asked to approximate $1$ probability using CLT in some way as defined above. The objection is concerned about $P(Y_n=np)$ on an interval which does not grow as $\sqrt{n}$ as CLT requires. We are concerned with $P(|Y_n| < 0.5/\sqrt{2500})$ on an interval which does grow as $\sqrt{n}$ as CLT requires.

Now if we were to use this method for general n, always making a and b $\pm0.5/\sqrt{npq}$, then we could not show convergence of our probabilities $P(|Y_n| < 0.5/\sqrt{npq})$ as $n\rightarrow \infty$ using the standard version of CLT above which requires that a and b be constants. However, we could still show convergence of these probabilities because we could apply a theorem (Feller VII.$3$ theorem 1) which allows us to relax this constraint for Bernoulli random variables so that a and b may be decreasing functions of n which go as ~$1/\sqrt{n}$ as we would require.

Specifically, D. Thomine has objected that the above method is "false as it stands, because if a and b are too small, then you need in general very large values of n before [convergence]. In particular, you cannot take a,b ~$1/\sqrt{n}$." We have seen that this is false due to the fact that we have Bernoulli random variables and can apply a special theorem that guarantees convergence. It has also been shown false by direct calculation demonstrating that it converges faster and better than the D. Thomine approximation method based on the mass function for all even values of n starting with 2 and continuing into the billions. Series expansions for the approximations in question as well as for the binomial have been derived which, barring errors, shows that the above method gives a value closer to the exact binomial value for all even values of n, in agreement with all calculations to date.

So in summary, we need not show convergence of our approximation method for $P(Y_n=np)$, and we cannot show it with the CLT above. However,this approximation method does produce values of $P(Y_n=np)$ which converge,and this can be shown with a theorem that applies specifically to Bernoulli random variables which is routinely used in conjunction with the DeMoivre-Laplace limit theorem for Bernoulli random variables (see examples in Feller VII.4).

3
On

As noted above, we can also use the De Moivre-Laplace theorem to approximate the central term of the binomial. This is a special case of the lattice form of the central limit theorem (Papoulis 8-5) which is obtained simply by substitution of the pmf for the binomial distribution into the normal pdf sampled at the points on which the probabilities must fall. We shall consider Bernoulli random variables $X_i$ that take on the value $1$ with probability $1/2$ corresponding to a step to the right at step i, and the value $0$ with probability $1/2$ corresponding to a step to the left at step i. Define

$$Y = \sum_{i=1}^{1000}X_i.$$

We wish to approximate

$$P(Y = 5000) = {10000 \choose 5000}\frac{1}{2^{10000}}$$

the probability that we end up at $0$ after $10000$ steps. Demoivre-Laplace tells us we can approximate this as

$$P(Y=5000) \approx \frac{1}{\sqrt{2\pi(2500)}} \approx 0.00797885$$

where the 2500 came from the variance of the binomial npq with n=10000, p=q=1/2. Compare to the exact answer which to 8 decimal places is 0.00797864 and to my previous approximation of 0.00797871 using the cumulative normal. We may verify that for even n from $2$ to $1$ million, the cumulative normal is always more accurate, with largest differences occurring for a small n.

Note that my earlier answer which subtracted 2 values of the cumulative normal has also been referred to as the De Moivre-Laplace limit theorem (Feller) to distinguish it from the version in this answer.

2
On

You can obviously estimate this probability with Stirling's approximation, but you said you want to use the Central limit theorem. Unfortunately, you cannot do that!

Let $S_n$ be the sum of $n$ i.i.d. random variables $(X_k)$, such that $\mathbb{P} (X_1 = 1) = \mathbb{P} (X_1 = -1) = 1/2$ - here, $n=10000$. What you want to estimate is:

$$\mathbb{P} (S_{10000} = 0).$$

However, the Central limit theorem tells you that for all $a < b$,

$$\lim_{n \to + \infty} \mathbb{P} (a \sqrt{n} \leq S_n \leq b \sqrt{n}) = \frac{1}{\sqrt{2 \pi}} \int_a^b e^{-\frac{x^2}{2}}\ dx.$$

By the way, BruceZ' first answer is false as it stands, because if $a$ and $b$ are too small, then you need in general very large values of $n$ before the LHS gets close to the RHS. In particular, you cannot take $a, b \simeq 1 / \sqrt{n}$. In other words, the CLT tells you what happens at a medium scale ($\simeq \sqrt{n}$), but to get the probability of being in any given point, we work at a finer scale ($\simeq 1$).

To get an idea of what can go wrong, remark that $\mathbb{P} (S_{10001} = 0) = 0$, so that the ansatz of "just evaluating with the Gaussian" fails (see also Did's comments).

The kind of results you are interested in are called Local central limit theorems. For completion, I include a statement and a proof. At first, you may skip the proof.

~~~~~

Theorem (Local CLT)

Let $(X_n)_{n \geq 0}$ be a family of integer-valued, independent, identically distributed random variables. Assume that:

  • $\sigma^2 := \mathbb{E} (X_1^2) < + \infty;$

  • $\mathbb{E} (X_1) = 0;$

  • There do not exist integers $M \geq 2$ and $0 \leq N < M$ such that $X_1 \in N + M\mathbb{Z}$ almost surely (aperiodicity).

Then for all $a \in \mathbb{Z}$,

$$\mathbb{P} (S_n = a) \sim \frac{1}{\sigma \sqrt{2 \pi n}}.$$

~~~~~

Proof

Let $a \in \mathbb{Z}$, and let $n \geq 0$. Then:

$$\mathbb{P} (S_n = a) = \mathbb{E} \left(1_a (S_n)\right) = \mathbb{E} \left(\sum_{p \in \mathbb{Z}} 1_{a=p} 1_{S_n=p}\right) = \mathbb{E} \left(\frac{1}{2\pi} \int_{-\pi}^\pi e^{iax} e^{-i S_n x} \ dx\right),$$

by Parseval's formula. With Fubini's theorem (everything is bounded), we can exchange the expectation and the integral:

$$\mathbb{P} (S_n = a) = \frac{1}{2\pi} \int_{-\pi}^\pi e^{iax} \mathbb{E} (e^{-i S_n x}) \ dx = \frac{1}{2\pi} \int_{-\pi}^\pi e^{iax} \psi(-x)^n \ dx,$$

where $\psi$ is the characteristic function of $X_1$.

By a convexity argument, the hypothesis of aperiodicity is equivalent to the property that $|\psi (x)| < 1$ for all $x \notin 2 \pi \mathbb{Z}$. Let $\varepsilon > 0$. Then there exists $\delta > 0$ such that $|\psi| \leq 1-\delta$ on $[-\pi, -\varepsilon] \cup [\varepsilon, \pi]$. Hence,

$$\frac{1}{2\pi} \left| \int_{-\pi}^{-\varepsilon} e^{iax} \psi(-x)^n \ dx + \int_\varepsilon^\pi e^{iax} \psi(-x)^n \ dx\right| \leq (1-\delta)^n,$$

which decays exponentially fast, and is negligible. On the other hand,

$$\frac{1}{2\pi} \int_{-\varepsilon}^\varepsilon e^{iax} \psi(-x)^n \ dx = \frac{1}{2\pi \sqrt{n}} \int_{-\varepsilon \sqrt{n}}^{\varepsilon \sqrt{n}} e^{iax/\sqrt{n}} \psi(-x/\sqrt{n})^n \ dx$$

$$= \frac{1}{2\pi \sqrt{n}} \int_{\mathbb{R}} 1_{[-\varepsilon \sqrt{n}, \varepsilon \sqrt{n}]} e^{iax/\sqrt{n}} \psi(-x/\sqrt{n})^n \ dx.$$

Now, let us use the two other hypotheses on $X_1$. Remark that $\psi(x) = 1 - \sigma^2 x^2 / 2 + o (x^2)$. Hence, up to a small error on $\sigma$ (which depends on $\varepsilon$), we get for large $n$:

$$\mathbb{P} (S_n = a) \sim \frac{1}{2\pi \sqrt{n}} \int_{\mathbb{R}} e^{- \frac{\sigma^2 x^2}{2} (1+o_\varepsilon (1))} \ dx = \frac{(1+o_\varepsilon (1))}{\sigma \sqrt{2\pi n}}.$$

Finally, let $\varepsilon$ go to $0$.

~~~~~

The proof is quite technical, and I was less than rigorous on the gestion of some of the error terms.

A few remarks. First, the result is exactly what you would expect from the CLT: the density at $0$ of a Gaussian of variance $n \sigma^2$ is exactly $1/\sigma \sqrt{2 \pi n}$. The hypothesis of aperiodicity is necessary to avoid the phenomenons such as $\mathbb{P} (S_{10001} = 0) = 0$.

So, what does this tells us in your setting? Let's try to apply the theorem, with $\sigma = 1$ and $n = 10000$. We get $\mathbb{P}(S_{10000} = 0) \simeq 0,004$. That is not what we get with Stirling's equivalent! Where did we make a mistake?

The reason is that, in your example, $X_1$ does not satisfy the aperiodicity assumption: it takes its values in $1+2\mathbb{Z}$. What happens is that, for $n$ close to $10000$, for even $n$, we get $\mathbb{P}(S_n = 0) \simeq 0,008$, and for odd $n$, we get $\mathbb{P}(S_n = 0) = 0$. The evaluation $\mathbb{P}(S_n = 0) \simeq 0,004$ is only true on average (in some sense of average).

How to bypass this problem? Consider $S_{2n}/2$. This is a random walk on $\mathbb{Z}$, where the steps have the distribution of $Y := (X_1+X_2)/2$. But $\mathbb{P} (Y = -1) = \mathbb{P} (Y = 1) = 1/4$ and $\mathbb{P} (Y = 0) = 1/2$. This random walk is aperiodic! In addition, the steps $Y$ have variance $1/2$, so that, by our theorem,

$$\mathbb{P} (S_{2n} = 0) = \mathbb{P} \left(\frac{S_{2n}}{2} = 0\right) \sim \frac{1}{\sqrt{\pi n}}.$$

Now, take $n = 5000$, and get:

$$\mathbb{P} (S_{10000} = 0) \simeq \frac{1}{100} \sqrt{\frac{2}{\pi}} \simeq 0,0080.$$