Statistics. How are standard error and confidence intervals useful without knowing population size?

503 Views Asked by At

I understand standard error and confidence intervals as formulas, but not as concepts. Can you help me understand them better?

A smaller standard deviation (smaller spread of your data) and a larger sample size both give you a smaller standard error. That in turn gives you a narrower confidence interval.

In layman's terms: As your data points move closer to the sample mean; and as your sample size (n) gets closer to your population size (N), you can be more confident that your sample statistic matches your population parameter.

But how do you calculate your sample confidence is your don't know your population size? An example I threw together in Excel:


You want to know how the median sick days workers in your town take each year. You survey companies and get responses for 36 workers. The mean for the 36 workers is 14.64 days (I'm rounding). The standard deviation is 9.30. That gives you a standard error of 1.55 and a 95% confidence interval of +-3.15.

You conclude, "I'm 95% sure that workers in our town take between 11 and 17 sick days per year."


But how do you know that estimate is even close? If your little town has only 100 workers, then a survey of 36 is pretty accurate. If you have 100,000 workers in your town, your sample is probably way off. The formulas for standard error and confidence interval (as well as standard deviation) don't have N in their calculations.

In many cases, you don't even know N (number of frogs in a national park; amount of drugs smuggled through an area; tons of ore in a mine). So how do you calculate (percent of frogs with a disease; percentage of drugs stopped; quantity of ore per ton of rock) without knowing N? Is a thousand frogs sufficient? Is a hundred bricks of pot a good job? If we extract 16 tons, what do we get?

Corollary to this: If we know N, can we use ti change our statistics for n?

This is a repeat of How is it that the required sample size for a specified error and confidence is not dependent on population size?, but I don't grasp the concept of infinite populations.

1

There are 1 best solutions below

4
On BEST ANSWER

Here's the thing. Qualitatively, we all 'grock' the Law of the Big Numbers: we all understand the intuitive idea that as the sample size increases, the observed percentage is more likely to approximate the actual percentage of the target group as a whole. For example, flipping an unbiased coin 10 times and getting 7 heads is not weird ... but flipping an unbiased coin 100 times and getting 70 heads is weird ... and getting 700 heads in 1000 flips makes it a near-certainly that we are dealing with a biased coin.

Unfortunately, what we're not so good at is the quantitative side of things. When, for example, we poll 1000 people about something, but this is out of a population of hundreds of millions, we feel that our sample couldn't possibly be anywhere large enough to give us some kind of narrow confidence interval ... and yet that is exactly what is the case: with a sample size of 1000, it doesn't matter whether your target size is 1 million, 1 billion, 1 trillion or, for that matter, infinite, the margin of error will only be about 3%! That is, we are surprised how narrow the confidence is, given how far removed the sample size is from the target population size.

But the thing is, the target population size really doesn't matter. Well, it matters if the target is close to the sample size, e.g. if sample is 1000 and target is 2000, then the margin of error will actually start to visibly decrease ... (i.e. it's getting even smaller yet!) ... but once the target is 'far enough' removed from the sample size, it really doesn't matter whether it's billions or quadrillions of infinite.

One way to see this is to go back to the coins. Here, we actually did a good bit better: we know that with 1000 flips, we should get pretty close to the actual percentage with which this coin comes up heads. But what is the 'target population' here? Well, it's effectively infinite: a biased coin that comes up heads 70% of the time, comes up heads 70% of the time when flipped infinitely many times. But if we flip it a 'mere' 1000 times, you and I know we should be getting pretty darn close to 70% as well. Or, more to the point, if we got 70% heads in 1000 flips, then we'll get somewhere near 70% for infinite flips as well.