Central Limit Theorem and Normal Approximation

2.5k Views Asked by At

having started 'learning' all that is related to the Central Limit Theorem just one day ago, I am already a bit confused - maybe you can help me seeing through the cloud of misunderstanding.

Let's assume I want the probability for "less or equal than 50 successes" when I pull balls from an urn; the success-probability is $p=0.9$. And I draw 52 times.

Now I'd like to take a look at this problem using the CLT and then the Normal Approximation:

The Central Limit Theorem (CLT) says:

  • $ \lim_{n \to \infty} P(\frac{\frac{S_n}{n} - µ}{\sigma / \sqrt{n}} \leq z) = \Phi(z)$ where $\phi(z)$ is the distribution function of standard normal distribution and $\frac{S_n}{n}$ is the mean of the sum of the first random variables ($S_n$ is considered to be a Bernoulli experiment of length n$).

Taking this, how can I calculate the probability of a non-standard random variable with binomial distribution? Wouldn't it be:

$ \frac{\frac{S_n}{n} - µ}{\sigma / \sqrt{n}} \leq z \Leftrightarrow S_n \leq \frac{z\cdot n \cdot \sqrt{n}}{\sigma} + µ\cdot n^2$ and therefore:

$\lim_{n \to \infty} P(\frac{\frac{S_n}{n} - µ}{\sigma / \sqrt{n}} \leq z) = \Phi(z) \Leftrightarrow \lim_{n \to \infty} P(S_n \leq \frac{z\cdot n \cdot \sqrt{n}}{\sigma} + µ\cdot n^2) = \phi(\frac{z\cdot n \cdot \sqrt{n}}{\sigma} + µ\cdot n^2)$

Now using the results of Normal Approximation

$P(x_1 \leq S_n \leq x_2) \approx \phi(\frac{x_2 + 0.5 - µ}{\sigma}) - \phi(\frac{x_1-0.5-µ}{\sigma})$.

The two results are not identically in any way - when do I "use" which one?

1

There are 1 best solutions below

0
On BEST ANSWER

Your first result is incorrect, and you seem to be using $S_n$ to mean different things in your results. The statement of the Central Limit Theorem is for $S_n$ the sum of $n$ iid random variables with mean $\mu$ and standard deviation $\sigma$. It is true from the CLT that $$\lim_{n\to\infty}\mathbb{P}\left(\frac{\frac{S_n}{n}-\mu}{\sigma/\sqrt{n}}\leq z\right) = \Phi(z),$$ but your other statement is false. Since $$\frac{\frac{S_n}{n}-\mu}{\sigma/\sqrt{n}} \leq z \iff S_n \leq z \sigma\sqrt{n} + \mu n,$$ it follows that $$\mathbb{P}\left(\frac{\frac{S_n}{n}-\mu}{\sigma/\sqrt{n}}\leq z\right) = \mathbb{P}\left(S_n \leq z \sigma\sqrt{n} + \mu n\right) = \Phi(z);$$ this equality then leads to the equality of the limits if limits exist.

For the normal approximation to the binomial, your $S_n$ appears to be being used to represent a binomial $(n,p)$ random variable with mean $\mu$ and standard deviation $\sigma$ (while before $\mu$ would be the mean of a single trial, and similarly for $\sigma$). If we use $S_n$ as before, we can obtain that $S_n \leq x \iff S_n \leq \frac{x-n \mu}{\sigma\sqrt{n}} \sigma \sqrt{n} + n \mu,$ and then use the CLT above (or more rigorously, $S_n \leq x \iff \frac{\frac{S_n}{n}-\mu}{\sigma/\sqrt{n}} \leq \frac{x/n-\mu}{\sigma/\sqrt{n}}=\frac{x-n\mu}{\sigma\sqrt{n}}$). This gives the same result as you state for the normal approximation, once you correct for differing uses of $\mu$ and $\sigma$.

The final difference then comes from the continuity correction. In taking a normal approximation to the binomial, you are effectively generating a normally distributed random variable and then rounding it to the nearest integer to get the binomial result. This means that if you are looking for the probability your binomial distribution is at most $x$, you would want the probability that the normal approximation rounds to a value that is at most $x$, so that the normal approximation has value at most $x+0.5$.