It is frequently stated that there are two forms of formula for calculating variance, and that one is used for a sample, in other words when a subset of a larger set of data is available, and one is used when the whole set of data is available.
Here's a reference describing just this.
Taken from the above reference, here is an example:
For example, if we take ten words at random from this page to calculate the variance of their length, a sample variance would be needed. To find the population variance, the length of every word on the page would be needed.
Imagine a book, with a page, which contains 100 words.
There are two equations we could use to calculate the variance of the length of the words.
$$\sigma^2=\frac{1}{N}\sum_{i=1}^{N}(x_i-\mu)^2$$
or
$$s^2=\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})^2$$
If $n=N-1$, we could imagine using the second formula to calculate a variance. In this case, we would be selecting 99 of the 100 available words from the page and calculating the variance.
If we increase $n$ by 1, such that $n=N$ and we are using all of the available words on the page, we would expect the two equations to give the same result, but they don't because the denominators differ.
This seems like a paradox. What is the resolution to it?
One possible explanation is that the first equation strictly speaking only makes sense in the context of an infinite sample. In such case, both equations will give the same results.
In other words, in the limit of $\lim n\to\infty$, both equations produce the same result.
This suggests that the second equation, or sample variance, should be used even if we are using all 100 words on the page.
In fact, it should also be used if the sample size is not all words on one page of the book, but all words on all pages in the book.
Further, it should also be used if calculating the variance in word length for all pages of all books that have been written.
Indeed, the sample variance should also be used to calculate the variance in word length for all pages of all possible books that could be written, which would be an infinite set of books, an infinite number of which are infinite in length, and an infinite number are also finite in length. Except to say that in this case, because this is the set of all possible books that could be written, we have obtained a complete sample space, and therefore the first equation is equally as valid to use here.
Is this correct? If not, what have I misunderstood?
I think that people are understating how surprising this should be. The sample variance $ s ^ 2 $ is not meant to be the same as the population variance $ \sigma ^ 2 $, of course, but it's still meant to be a good estimate (at least an unbiased one) of the population variance based on the limited information in the sample. So it would be nice, in the case where we have all of the information about the population, if the variance estimated from complete information would be the actual variance, and the formula for sample variance doesn't do this. Instead of $ s ^ 2 = \sigma ^ 2 $, we have $ 9 9 s ^ 2 / 1 0 0 = \sigma ^ 2 $, so the uncorrected sample variance $ \tilde s ^ 2 = ( n - 1 ) s ^ 2 / n $ is really the best estimate with this information. We can't even say that $ s ^ 2 $ is unbiased; it always gives us a number that's larger than $ \sigma ^ 2 $ in this case. (Contrast this with the sample mean $ \overline x $ and the population mean $ \mu $. In this case, if the sample is the entire population, then $ \overline x = \mu $ exactly.)
Look at a proof that $ s ^ 2 $ is an unbiased estimate of $ \sigma ^ 2 $. You linked to a Wikipedia page which has an argument for this, but there's a more explicit elementary proof in Alternative 1 on the Wikipedia page dedicated to Bessel's correction (the replacement of $ n $ with $ n - 1 $). Here you can see that the proof relies on the individuals in the sample being independent. This is, in my opinion, the key.
So consider the page with $ 1 0 0 $ words. If we look at the lengths of all $ 1 0 0 $ words, then we can calculate the population mean $ \mu $ exactly and the population variance $ \sigma $ exactly. If we take a sample consisting of all $ 1 0 0 $ words and use that to calculate a sample mean $ \overline x $ and a sample variance $ s ^ 2 $, then $ \overline x = \mu $ as it should be but $ s ^ 2 \ne \sigma ^ 2 $ because we divided by $ 9 9 $ instead of by $ 1 0 0 $. But this is not an independent sample, so that calculation of sample variance doesn't apply anyway!
Instead, if we randomly and independently sample $ 1 0 0 $ words from the page, now there's no reason to expect that we got each word exactly once. Most likely, we duplicated some words and skipped others. So the sample mean $ \overline x $ is not guaranteed to equal the population mean $ \mu $, but $ \overline x $ is still an unbiased estimate for $ \mu $. And of course the sample variation $ s ^ 2 $ is not guaranteed to equal the population variance $ \sigma ^ 2 $, but the uncorrected sample variance $ \tilde s ^ 2 $ isn't guaranteed to equal $ \sigma ^ 2 $ either. In fact, since our words are probably duplicated and there is probably some clumping together of the lengths in our data, $ s ^ 2 $ is actually a better estimate of $ \sigma ^ 2 $ than $ \tilde s ^ 2 $ would be.
And there's nothing special about a sample size of $ 1 0 0 $ here either. We could take a sample of $ 1 0 0 0 $ words from this page, even though there are only $ 1 0 0 $ words on the page! This is because each word is chosen independently. There's no way to make a random independent sample equivalent to the actual population.
But what if you take a random sample of $ 9 9 $ separate words from the page? These words aren't independent, because after picking a word once, you won't pick it again. But it's still random, since one of the words was randomly chosen to be excluded. The best estimate (or at least an unbiased one) for $ \mu $ is still $ \overline x $. But neither $ s ^ 2 $ nor $ \tilde s ^ 2 $ is an unbiased estimate for $ \sigma ^ 2 $ in this case. The uncorrected sample variance $ \tilde s ^ 2 $ is still biased towards being too small; it (in effect) assumes that the one skipped word has the same length as the average of the other $ 9 9 $, when it's probably a little bit off one way or the other. But the corrected sample variance $ s ^ 2 $ is now biased towards being too large; you won't get any clumping of lengths from repeated words, so you don't need to correct as much. There's an unbiased estimate of $ \sigma ^ 2 $ somewhere between $ s ^ 2 $ and $ \tilde s ^ 2 $, obtained by dividing by a number somewhere between $ 9 8 $ and $ 9 9 $ (which I haven't tried to calculate).
If you take a sample of $ 1 0 0 $ separate words from the population of $ 1 0 0 $ words, then $ \tilde s ^ 2 = \sigma ^ 2 $ and so $ \tilde s ^ 2 $ is an unbiased (in fact perfect) estimate of $ \sigma ^ 2 $. If you take a random sample of $ 1 $ separate word, then that's an independent sample, and so $ s ^ 2 $ is the best estimate (and correctly gives you $ 0 / 0 $ because you have no information about the population variance whatsoever). If you take a random sample of $ n $ separate words for $ n $ somewhere between $ 1 $ and $ 1 0 0 $, then the unbiased estimate of $ \sigma ^ 2 $ lies somewhere between $ s ^ 2 $ and $ \tilde s ^ 2 $. But if you take a random sample of $ n $ independent words (so allowing repetition) for $ n > 1 $ (and quite possibly $ n > 1 0 0 $ now), that's when $ s ^ 2 $ is an unbiased estimate of $ \sigma ^ 2 $.
So the formula for sample variance involving division by $ n - 1 $ applies when the sample is independent, not otherwise.