How can I determine sample size in interval estimation? The oridinary way of determining sample size $n$ would be like the following:
- Let $n$ be big enough to the extent of Central Limit Theorem.
- take a permitted error $\epsilon$ arbitrarily which corresponds to the extent of Confidence Interval(CI).
- Let confidence coefficient "95" (1.96) , then $1.96 * \sigma / \sqrt{n} = \epsilon$, where $\sigma^2$ is the population variance.
- Solve this for $n$ , $n = (1.96 * \sigma / \epsilon )^2$
- This would be the estimate of sample size which gives 95% CI with error bar $\epsilon$.
What I want to know is how to calculate the population variance $\sigma^2$ which appears in $n = (1.96 * \sigma / \epsilon)^2$ in case that the population distribution is not given( not necessarily Gaussian) and the population variance $\sigma^2$ is unknown.
Since the population distribution is not given, we cannot use t-distribution. So, we have to calculate unbiased variance $s^2$ from a sample and use it as the estimate of population variance. However, in the calculation of unbiased variance from a sample, the sample size $n$ appears. It is circulating.
How can I estimate sample size in case that the population distribution is not given( not necessarily Gaussian) and the population variance $\sigma^2$ is unknown?
The central limit theorem applies only if the random variables involved are independent and identically distributed with finite, not infinite, variance.
In that case, if all you know is that the population variance is finite, but you don't know an upper bound on that variance, then it's impossible in general to determine a sample size to achieve any given confidence interval. The same applies if the population random variable has finite moments but their bounds are unknown.
See Kunsch et al. 2018, Theorem 3.4. The proof boils down to the problem of distinguishing two Bernoulli random variables (which both have finite variance and moments):
See also the following question, whose answer shows a similar impossibility result:
For Bernoulli and other bounded variables, you can determine a sample size using Chebyshev's inequality or Hoeffding's inequality; see the following:
If you can accept a weaker guarantee on confidence intervals (for example, "within a reasonably tight interval of the true mean with probability at least $1-\delta$" rather than the stronger "within $\epsilon$ of the true mean with probability at least $1-\delta$"), then recent work has offered algorithms that estimate the mean with the sole assumption that the mean is finite (or any moment in between the first and second is finite) (Cherapanamjeri et al. 2020). However, these algorithms are far from trivial.
REFERENCES: