Lower bound on sample size to determine coin fairness

142 Views Asked by At

This is a conclusion I came to through very basic knowledge of statistics and error analysis.

Suppose we want to determine the fairness of a coin. To do this, we need to determine the probability of the coin landing on either Heads or Tails and we do this through repeated flipping. We have an amount of uncertainty we are willing to allow in our measurement of the probability. We define the variable $f$ as follows: $$f = \begin{cases} 1, & \text{if the coin lands on Heads} \\ 0, & \text{if the coin lands on Tails} \end{cases}$$ Then our measurement of the probability $P(H)$ of the coin landing on heads is equal to the sample average of $f$, $\bar f$. The formula for the standard error of the measurement of $f$ through $n$ samples is: $$\Delta f = \frac{\sigma_f}{\sqrt{n}}$$ Now, since $f$ can only have values $0$ and $1$, its standard deviation cannot be greater than $\frac{1}{2}$. So we have: $$\Delta f < \frac{1}{2\sqrt{n}}$$ So if we flip the coin $n$ times, we can be pretty confident our error in estimating the probability is at most $\frac{1}{2\sqrt{n}}$. Rearranging: $$n<\frac{1}{4(\Delta f)^2}$$ So to measure the probability with an error of $\Delta f$ we can be pretty confident we'll need to flip the coin $\frac{1}{4(\Delta f)^2}$ times at most. Then for example, if I want to determine the probability of the coin landing of Heads within an error of 5%, I'll need to flip it at most 100 times.

Is this reasoning correct?

1

There are 1 best solutions below

0
On

Let me first formally express the way commonly used to describe the accuracy of the point estimator $\bar{F_n}$ for the parameter $f$.

Let the following inequality hold:

$$\mathbb P \left (|\text{error}_n|:=|\bar{F_n}-f| < \epsilon \right) \ge \delta.$$

Then, we say that the probability of that the error of estimating $f$ using $\bar{F}$ is less than $\epsilon>0$ is at least $\delta$ (confidence level).

Now let us find out how $n$, $\epsilon>0$, and $\delta$ are related. One way is to use Chebyshev inequality. Using this inequality, and noting that $f(1-f) \le \frac{1}{4}$ for $f \in [0,1]$, we obtain

$$\mathbb P \left (|\text{error}_n| < \epsilon \right) \ge 1-\frac{1}{4n\epsilon^2}.$$

From this, we can make the following key observations:

  1. When $\epsilon>0$ and $\delta$ are given, if $n\ge \frac{1}{4(1-\delta)\epsilon^2}$ then $\mathbb P \left (|\text{error}_n| < \epsilon \right) \ge \delta $.

  2. When $n$ and $\epsilon>0$ are given, then $\delta$ is at least $1-\frac{1}{4n\epsilon^2}$, i.e, $\mathbb P \left (|\text{error}_n| < \epsilon \right) \ge \delta $ holds for some $\delta \ge 1-\frac{1}{4n\epsilon^2}.$

  3. When $n$ and $\delta>0$ are given, then $\epsilon$ is at most $ \frac{1}{2\sqrt{(1-\delta)n}}$, i.e, $\mathbb P \left (|\text{error}_n| < \epsilon \right) \ge \delta $ holds for some $\epsilon \le \frac{1}{2\sqrt{(1-\delta)n}}$.

Using the above observations, your statements can be corrected/improved.


Your first statement in the OP

If we flip the coin $n$ times, we can be pretty confident our error in estimating the probability is at most $\frac{1}{2\sqrt{n}}$.

can be corrected/improved as follows (using the third observation):

If we flip the coin $n$ times we can be pretty confident (with probability $\delta$ ) that our error in estimating the probability $f$ is at most $\frac{1}{2\sqrt{(1-\delta)n}}$.


Your next statement in the OP

To measure the probability with an error of $\Delta f$ we can be pretty confident we'll need to flip the coin $\frac{1}{4(\Delta f)^2}$ times at most.

can be corrected/improved as follows (using the first observation):

To estimate the probability $f$ with an error of $\Delta f$ we can be pretty confident (with probability $\delta$ ) if we flip the coin $\frac{1}{4(1-\delta){(\Delta f})^2}$ times at least.


The last statement in the OP

If I want to determine the probability of the coin landing of heads within an error of 5%, I'll need to flip it at most 100 times.

can be corrected/improved as follows (using the first observation):

If I want to determine the probability of the coin landing of heads within an error of 5% and confidence $\delta$, I'll need to flip it at least $\frac{100}{(1-\delta)}$ times.


Note that the value of confidence $\delta$ considerably changes the minimum sample size. For example, If "pretty confident" by you is translated to the confidence level of $\delta=0.99$, then to have a maximum error of $\Delta f=0.05$, we need to flip the coin at least 10000 times not 100 times, which are obtained from the correct $\frac{1}{4(1-\delta){(\Delta f})^2}$ and incorrect $\frac{1}{4{(\Delta f})^2}$ formulas, respectively.