Different methods for sample size estimates of a binomial

191 Views Asked by At

In undergrad, I learned to estimate the minimum sample size required to estimate the parameter of a binomial distribution using Chebyshev's inequality [1]. That method gives the equation: $ n \geq \frac{p(1 - p)}{\epsilon^2\delta} $, where $p$ is the bias of the coin, $\epsilon$ is the desired error between the estimate and true value, and $\delta$ is the acceptable uncertainty.

Reading Thompson, 1987 to find a way to generalize this to a multinomial distribution, the formula seems a little different. When discussing the case for a binomial distribution, Thompson refers to Cochran, and says that the binomial can be replaced by a normal approximation. Then, in that case, the sample size estimate is

$ n \geq \frac{z^2 p(1-p)}{\epsilon^2} $, with the parameter definitions from above, and $z$ being the z score for the desired confidence interval.

What I don't understand is why these give different values. The second method gives lower sample size estimates. I follow how each formula was derived, but don't understand the intuitive difference - why does using a normal approximation to the binomial distribution result in a smaller minimum sample size estimate?

1

There are 1 best solutions below

1
On BEST ANSWER

Yup, the more you assume, the more you can say. Hoeffding's inequality uses the assumption that $X$ is sub-gaussian, that is, $$P(\lvert X - \mu\rvert \geq t) \leq c P(\lvert Z \rvert \geq t)$$ for some absolute constants $c,\sigma$ where $Z\sim \mathcal N(0,\sigma^2)$. The tails of the normal are extremely light. In fact, if $\Phi$ is standard normal, then $$P(\Phi\geq t) \underset{t\to \infty}{\sim}\frac{1}{t}\frac{1}{\sqrt{2\pi}}e^{-t^2/2}$$ More concretely, the normal has 95% of its mass within 2 standard deviations of its mean, whereas Chebyshev says 75% of the mass is within 2 standard deviations. This is to be expected as the assumption of Chebyshev is very weak, requiring only a finite second moment. The finiteness of moments is directly related to the rate of decay of tail probabilities, which can be seen by the following identity, which holds for any random variable $X\geq 0$: $$E X = \int_0^{\infty}P(X>t)\,dt$$ Consequently, $$E \lvert X\rvert ^p = \int_0^{\infty}P(\lvert X\rvert ^p > t)\,dt = \int_0^{\infty}P(\lvert X\rvert>u)pu^{p-1}\,du$$ So Hoeffding's inequality is kind of like a non-asymptotic central limit theorem (not really though). It tells you that if your random variable of interest has tails as light as a normal, i.e. sub-gaussian, that you also get very good, gaussian-like concentration of measure around the mean.

It's also insightful to compare Hoeffding with the central limit theorem. The issue with the central limit theorem is that it is an asymptotic statement, and in general, convergence of the normalized sample mean to the standard gaussian can be very slow. So even though the central limit theorem has the same assumptions as Chebyshev, namely finite second moment, and using the normal approximation gives tighter concentration inequalities than Chebyshev, one may not be able to actually trust the normal approximation because of slow convergence. If your summands have finite third moment, then one has a quantitative, non-asymptotic form of the central limit theorem (known as Berry-Esseen CLT) which says that the approximation error of the normalized sample mean is of the order $$\sup \lvert F_n(x) - \Phi(x) \rvert = O(1/\sqrt n)$$ For more on this, Wainwright's High Dimensional Statistics: A Non-Asymptotic Viewpoint is very nice.