What qualifies as an infinite population?

4.1k Views Asked by At

I've been looking for a clear guideline to distinguish between a finite population and an infinite one, for example, in some places an infinite population is described as something like the number of stars in the universe, and in other places it's described as something way smaller, like the number of products in the market. I'm studying highschool math so I'd like an answer for that level of knowledge. Thanks!

1

There are 1 best solutions below

0
On BEST ANSWER

As @RyanA has commented, the only truly infinite populations are abstract theoretical ones. For practical statistical purposes, it is sometimes feasible to treat a sufficiently large finite population as if it were infinite. Just when, depends on the situation and the probability model you are using. Here are two examples that illustrate the importance of context and model.

Binomial vs. Hypergeometric. Suppose you are sampling $n$ subjects from some population and want a probability model for the number $X$ of women in the sample. You might model $X$ as binomial if the population is infinite and as hypergeometric if the population if finite. Suppose you wonder if the proportion $p$ of women is 50% (might be true if you're sampling from among 'people in general' and false if you're sampling from among 'kindergarten teachers'). Suppose you are sampling at random so each individual is chosen fairly and without regard to others already chosen.

If $n = 100,$ $p = 0.5,$ and the population is infinite, then $X \sim \mathsf{Binom}(n = 100, p = .5).$ If you want to know $P(X \le 45)$ then you can use the binomial formula, binomial tables, a normal approximation, or software to find that $P(X \le 45) = 0.1841.$ (The computation from R statistical software is shown below.)

 pbinom(45, 100, .5)
 ## 0.1841008

By contrast, if $n = 100$ subjects are chosen 'without replacement' (no one can be 'chosen' more than once), the population is finite of size $T = 400$ with exactly $w = 200$ women and $m = 200$ men, then $X \sim \mathsf{Hyper}(n = 100, T = 400, w = 200)$ and $P(X \le 45) = 0.1493,$ as shown below. This is noticeably different from the result for the binomial model. (Intuitively, the reason for the difference is that as men get 'used up' there is a compensating effect that women are more likely to be chosen, so a result of 45 or fewer women is less likely.)

phyper(45, 200, 200, 100)
## 0.1493418

However, if $n = 100$ and the population is finite of size $T = 4000$ with exactly $w = 2000$ women, then $X \sim \mathsf{Hyper}(n = 100, T = 4000, w = 2000)$ and $P(X \le 45) = 0.1811,$ which is not much different from the binomial result for practical purposes.

phyper(45, 2000, 2000, 100)
$$ 0.18106

A rough rule of thumb in these circumstances is that one can regard the population as "infinite," if it is more than 10 times the sample size. (Depending on how 'fussy' they are about approximations, various authors may give somewhat different rules of thumb about the ratio of the population size to the sample size.)

The figure below shows binomial probabilities as bars, hypergeometric probabilities ($T$ = 400) as orange X's, and hypergeometric probabilities ($T$ = 4000) as blue circles. The desired probability is the sum of probabilities to the left of the vertical dotted red line.

enter image description here

Discrete vs. Continuous. Suppose you are interested in modeling weights of members of male college swimming teams. Perhaps you know that the mean weight is around 70kg and the standard deviation is around 7kg. If you are rounding weights to the nearest kilogram, then there may be something like forty different feasible weights. And you might make a bar chart showing the observed (integer) weights of 100 swimmers sampled. Anyhow, viewing rounded weights as the 'population', you can surely consider the population as finite.

From the tally of weights in the bar chart, you might find that about 70% of them weigh between $\mu \pm \sigma$ or in the interval $[63,77]$.

A more traditional approach might be to model swimmers' weights as approximately $W \sim \mathsf{Norm}(\mu=70, \sigma=7).$ The normal distribution is continuous: for example just between 70kg and 71kg there are infinitely many conceivable weights. But you are not likely interested in that degree of precision.

However, taking rounding into account, you might use the normal model to say that $P(62.5 < W < 77.5) = 0.7160.$

diff(pnorm(c(62.5,77.5), 70, 7))
## 0.7160232

The histogram below shows a sample of 100 weights rounded to integers, of which exactly 71 happen to fall in the interval $[63,77]$. The density function of $\mathsf{Norm}(70,7)$ is shown for comparison.

enter image description here