I'm a first year math student and I am having trouble with this exercise:
Let $S_n$ be the amount of times we get heads when throwing a coin $n$ times. Let $Z_n = \frac{S_n}{n}$. With the equality of Chebyshev we know that for $n \geqslant 125000 : \mathbb{P}(|Z_n - \frac{1}{2}| \geqslant 0,01) \leqslant 0,02$.
Show that, with the normal distribution, this holds for much smaller $n$.
I've done this:
I computed:
$\mathbb{E}(Z_n) = \frac{1}{n}\mathbb{E}(S_n) = \frac{1}{2}$
$\mathbb{V}ar(Z_n) = \frac{1}{n^2}\mathbb{V}ar(S_n) = \frac{1}{2n} - \frac{1}{4}$
Now
$\mathbb{P}(|Z_n - \frac{1}{2}| \geqslant 0,01) \leqslant 0,02$
$\mathbb{P}(Z_n \geqslant 0,51) + \mathbb{P}(Z_n \leqslant 0,49) \leqslant 0,02$
$1 - \Phi(\frac{0,01}{\sqrt{\frac{1}{2n} - \frac{1}{4}}}) + \Phi(\frac{-0,01}{\sqrt{\frac{1}{2n} - \frac{1}{4}}}) \leqslant 0,02$
And at the and I get $n \geqslant 1,999$, which is of course not true.
Could you please tell me what I have done wrong, and how I will get the good answer (which is $n \geqslant 13573$)?
Thanks in advance!
I can't say what went wrong in the computation of the variance, but $S_n$ is $B\bigl(n,\frac{1}{2}\bigr)$-distributed, and the variance of a $B(n,p)$ distribution is $n\cdot p(1-p)$, so here we have $\operatorname{Var} S_n = \frac{n}{4}$ and hence $\operatorname{Var} Z_n = \frac{1}{4n}$.
For a $N(0,1)$-distribution, we have
$$P(\lvert X\rvert \geqslant c) = P(X\geqslant c) + P(X \leqslant -c) = 1-\Phi(c) + \Phi(-c) = 2(1-\Phi(c)),$$
and thus
$$P(\lvert X\rvert \geqslant c) \leqslant 0.02 \iff 2(1-\Phi(c)) \leqslant 0.02 \iff \Phi(c) \geqslant 0.99,$$
and, looking that up in a table and interpolating [per the table, $\Phi(2.32) = 0.9898$ and $\Phi(2.33) = 0.9901$], that means approximately $c \geqslant 2.327$.
If we make the simplifying assumption that $Z_n \sim N\bigl(\frac{1}{2},\frac{1}{4n}\bigr)$, that is, $A_n \sim N(0,1)$, where
$$A_n = \frac{Z_n - \frac{1}{2}}{\sigma(Z_n)} = 2\sqrt{n}\bigl(Z_n-\frac{1}{2}\bigr),$$
and want $P\bigl(\bigl\lvert Z_n-\frac{1}{2}\bigr\rvert \geqslant 0.01\bigr) \leqslant 0.02$, that translates to
$$P\bigl(\bigl\lvert Z_n -\tfrac{1}{2}\bigr\rvert \geqslant 0.01\bigr) = P\left(\frac{\bigl\lvert Z_n -\frac{1}{2}\bigr\rvert}{\sigma(Z_n)} \geqslant \frac{0.01}{\sigma(Z_n)}\right) \leqslant 0.02$$
and thus $\frac{0.01}{\sigma(Z_n)} \geqslant 2.327$ resp. $\frac{1}{\sigma(Z_n)} \geqslant 232.7$. Using $\sigma(Z_n) = \frac{1}{2\sqrt{n}}$, we obtain
$$\sqrt{n} \geqslant 50\cdot 2.327 = 116.35,$$
and finally
$$n \geqslant \bigl\lceil 116.35^2\bigr\rceil = \lceil 13537.3225\rceil = 13538.$$
That is close to the suggested answer of $n \geqslant 13573$, for which we get
$$\frac{\sqrt{13573}}{50} \approx \frac{116.5}{50} = 2.33,$$
so the difference seems to be that the suggested answer was computed from the larger $\Phi$ value without interpolation. Since we make an error when approximating $Z_n$ with a normal distribution, it is not unreasonable to take the slightly larger value without interpolation, so the likelihood of underestimating the required number by the normal approximation is smaller. It doesn't make much of a difference anyway.
If we take the erroneous formula $\frac{1}{2n} - \frac{1}{4}$ for the variance, the computation yields
$$\frac{1}{2n} - \frac{1}{4} \leqslant \frac{1}{232.7^2} \iff n \geqslant \frac{1}{2\bigl(\frac{1}{4} + \frac{1}{232.7^2}\bigr)} = \frac{2}{1+\frac{4}{232.7^2}} \approx 2 - \frac{8}{232.7^2} \approx 1.99985,$$
which is compatible with your result if you truncated to the places after the decimal, but not if you rounded to the nearest number with four significant digits. Since the part of the computation you presented is correct (except for the miscomputed variance), I tend to believe that you just truncated the evidently wrong value and that the miscomputed variance was the only error.