Test if coin is fair with significance/confidence of 95%

47 Views Asked by At

Please take a look at the next $10$ mins of this lecture starting here:

https://youtu.be/rYefUsYuEp0?t=1147

We are trying to find a number $\xi$ such that:

$P(|S_n-n \cdot (1/2)| \le \xi) \approx 0.95$

The professor finds $\xi=31$ but he says that he "pretends" $S_n$ is approximately standard normal. Why $S_n$? I apply the CLT literally and so I "pretend" that

$$Z_n = \frac{(S_n - n/2)}{(1/4)\cdot\sqrt{n}}$$ is standard normal.

By "pretend" I just mean the usual i.e. I get the right to use the standard normal tables
(as justified by the CLT).

In the standard normal table I looked for $0.975$ and I found the Z-score number $1.96$.

When I did all the computations I got $\xi = 15.495$ which is twice less. Why?

I am interpreting it this way: if I make 1000 coin tosses and if the number of heads I observe is no more than 15 away from 500, then I conclude with certainty of about 95% that my coin is fair.

Am I incorrect conceptually? Or did I mess up the calculations?
Or is there something else here which I am not taking into account?

A deviation/difference of 31 given 1000 coin tosses seems too much to me just intuitively. But intuition can lie.

Also, not sure why he says $S_n$ is approximately standard normal.
It should be $Z_n$, right?

Maybe the lecturer is oversimplifying just for presentation purposes,
and that's why he gets 31 and not 15.49. Is it indeed so?

1

There are 1 best solutions below

7
On

The lecturer does not actually claim $S_n$ is approximately standard normal, only that it is approximately normal. See time index 24:40 in the linked video.

That said, this statement is itself inaccurate. With increasing $n$, the distribution of $S_n$ does not tend toward a normal distribution--it remains binomial with parameters $n$ and $p$. Rather, the CLT says that it is $(S_n - n/2)/\sqrt{n}$ that tends to a normal random variable, and this is because $S_n = B_1 + B_2 + \cdots + B_n$ where $B_i \sim \operatorname{Bernoulli}(p)$ are the underlying iid variables in the theorem.

However, this inaccuracy is not fatal. We can still appeal to the CLT to obtain an estimate of the boundary of the rejection region; i.e., critical value $\xi$ for the test statistic $|S - 500|$. Keep in mind, the use of a normal approximation here is motivated by convenience, not necessity. We could use the exact distribution of $S$ to find the smallest integer $\xi$ such that $$\Pr[|S - 500| \le \xi] = \sum_{x = 500 - \xi}^{500 + \xi} \binom{1000}{x} 2^{-1000} \ge 1 - \alpha = 0.95.$$ In this way, the Type I error of the test will be at most $\alpha$. Such a calculation is easily tractable with a suitable computer program; for instance,

$$\begin{array}{c|c} \xi & \Pr[|S - 500| \le \xi] \\ \hline 0 & 0.025225 \\ 1 & 0.0755744 \\ \vdots & \vdots \\ 30 & 0.946322 \\ 31 & 0.953709 \\ 32 & 0.960221 \\ \vdots & \vdots \\ 500 & 1. \end{array}$$ But if we do not have such a program, then it is the normal approximation to the binomial that we may use. The idea is that $S$ is approximately normal with mean $\mu = np = 500$, and standard deviation $\sigma = \sqrt{np(1-p)} = 5 \sqrt{10} \approx 15.8114$. Then $$\begin{align} \Pr[|S - 500| \le \xi] &\approx \Pr\left[\frac{- \xi}{15.8114} \le \frac{S - \mu}{\sigma} \le \frac{\xi}{15.8114}\right] \\ &\approx \Phi\left(\frac{\xi}{15.8114}\right) - \Phi\left(\frac{- \xi}{15.8114}\right) \\ &= 1 - 2\Phi\left(-\frac{\xi}{15.8114}\right), \end{align}$$ where $\Phi$ is the CDF of the standard normal distribution. Setting this to $1 - \alpha$ then gives us $$ \xi = -(15.8114)\Phi^{-1}(\alpha/2)$$ where $\Phi^{-1}$ is the inverse CDF or quantile function. For $\alpha = 0.05$, $\Phi^{-1}(\alpha/2) \approx -1.95996$, therefore $$\xi \approx 30.9898 \approx 31.$$