How to find the large enough number of trials in a binomial distribution based on a desired standard error?

39 Views Asked by At

I have a question about the number of trials in a binomial distribution. Let's assume we have a coin whose fairness is unknown to us. We want to know the probability of observing the head after a single toss. According to the maximum likelihood estimation (MLE), we calculate the corresponding probability as follows: $$p = \frac{k}{n}$$ Let's assume the actual probability is $P$. As we increase the number of trials (i.e. $n$) the standard error between $p$ (obtained from MLE) and $P$ decreases. Now, suppose that we like $|P - p| \le \alpha$ which means we like the error between the actual probability and the obtained probability from MLE to be less than equal to a value for example 0.01. My question is how to find the value of $n$ (the number of trials) such that $|P - p| \le \alpha$? I'm not a statistician, so if I've used any wrong term in my question please suggest the correct alternative. Also, please suggest some useful references as I like to learn more in this area.

1

There are 1 best solutions below

1
On BEST ANSWER

You can't guarantee what you described in general, since it's always possible that by chance your estimate is far from the true value (imagine flipping heads for each of the n trials with a fair coin, it's possible even though it's unlikely).

Usually we try to collect enough samples such that our estimate will be close to the true value with high probability. One way to guarantee this is using concentration inequalities. We know that for a bernoulli random variable we can apply a Hoeffding bound:

$$\mathbb P (|P - p| \geq \alpha) \leq \exp\left(- 2n\alpha^2\right)$$

So given a chosen value for $\alpha$ and a chosen maximum acceptable deviation probability, we can solve for $n$. For more details see: https://en.wikipedia.org/wiki/Hoeffding%27s_inequality#Confidence_intervals

A more common approach would be to approximate the normalized deviation as a gaussian random variable and apply the central limit theorem to determine the number of samples:

When the number of samples is large, we know that $\sqrt n (P - p)$ is close in distribution to a $\mathcal N (0, P(1-P))$ random variable. So the sample techniques for selecting the number of samples for normal (gaussian) confidence intervals also applies.

You might find this useful: https://stats.libretexts.org/Bookshelves/Introductory_Statistics/Foundations_in_Statistical_Reasoning_(Kaslik)/06%3A_Confidence_Intervals_and_Sample_Size (in particular the last section on this page)