How to determine a sample size to get accurate estimates of a given data set?

193 Views Asked by Bumbble Comm At 10 May 2026 - 9:55

I have a question with a statistical nature; I think there should be some standard theory about this issue.

Suppose I have a large data set of size $N$ items, which has an amount of $K<N$ unwanted items. I am interested in finding the value of $K$. Testing all items takes too much time, so I want to determine a suitable sample size $n<N$ of randomly selected items in the data set.

Suppose I just pick a value for $n$

Then, of a randomly sample data of size $n$, I search for the unwanted items of which there are some amount of $k\leq n$. Let this amount be a test statistic $T$, i.e. I will test on the probability $P(T \geq k)$. I can now find a smallest integer value $K_\min$ such that for the estimation $K = K_\min$ we have $P(T \geq k) \geq \alpha$. That is, for any smaller integer estimation $K<K_\min$ we have $P(T \geq k) < \alpha$. If I am correct, I can now state that with a significance level $\alpha$ we have that $K \geq K_\min$. Is that true?

If this is true, the question now is: How accurate is this lower bound?

This is also my main question. Based on the amount $n$ and accuracy level $\alpha$, what can we say about the accuracy of $K_\min$. In other words, can we determine some confidence interval on $K$ in relationship to $K_\min$ and $\alpha$?

Any tips or other approaches are very much appreciated!

Best, Koen

Edit 26 November:

Another formulation of the problem as mentioned by David K is as follows:

Given some "error" tolerance $\varepsilon$, how do we choose $n$ for a given $\alpha$ such that we can guarantee that $|K_\min−K|/N\leq \varepsilon$ (or some assurance like that)?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 31 Oct 2017 - 5:57

I suppose that when you compute $P(T \geq k),$ you have in mind a probability distribution of $T$ based on sampling $n$ items from $N$ of which $K$ are "unwanted". A hypergeometric distribution seems to fit the requirements.

In that light, I agree completely with your paragraph between "Suppose I just pick a value" and "If this is true".

It seems to me that then $K > K_\min$ is your confidence interval, that is, you have a one-sided confidence interval for $K$ whose lower bound is $K_\min.$

How to determine a sample size to get accurate estimates of a given data set?

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in SAMPLING

Related Questions in DATA-ANALYSIS

Related Questions in SAMPLING-THEORY

Trending Questions

Popular # Hahtags

Popular Questions