Sample variance formula vs. Population variance formula usage

373 Views Asked by Bumbble Comm At 27 Mar 2026 - 12:01

Just for sake of not having to write this equation out a few times, I will denote the residual sum of squares for a set of data points $X$ to be $$ RSS= \sum_{x\in X} \left(x - \overline{x}\right)^2 $$

Now I understand that there are two different formulas for variance, with sample variance being defined as $$ s^2 = \dfrac{RSS}{n-1} $$ and population variance being $$ \sigma^2 = \dfrac{RSS}{n} $$ for $n$ data points.

Now I understand that we use the sample formula, as the population formula is a biased estimator for the true variance.

Now with that I was listening to one of my machine learning lectures and saw that my professor told us to calculate the variance for Naive Baysian classifier using the population formula since it is proven to be the Maximum Likelyhood Estimator for Gaussian Distribution variance. I also understood this proof, but I was confused about why we would select to use the formula for population variance on our sample when we know it is a biased estimator?

Anyone who could provide some insight would be appreciated!

Original Q&A

There are 1 best solutions below

Bumbble Comm On 19 May 2020 - 11:11

For reasonably large n it is not going to make a significant difference.
The "sample variance" is not the best(optimal) estimator for the variance. The optimality criteria being MSE(Mean Square Error). The estimator with lower MSE is considered to be better. It is more accurate.

Item 2 in the list requires more explanation. Let's denote $$ \hat{[\sigma}^{*}]^2 = \frac{1}{k-1}\sum_k (x_k - \bar{x})^2 \tag{1} \label{1}$$ $$ \hat{\sigma}^2 = \frac{1}{k}\sum_k (x_k - \bar{x})^2 \tag{2} \label{2} $$

The MSE for $\eqref{1}$ is $e^{*} = E[(\hat{[\sigma}^{*}]^2 - \sigma^2)^2]$
The MSE for $\eqref{2}$ is $e = E[(\hat{\sigma}^2 - \sigma^2)^2]$
Where $\sigma^2$ is true variance.

It can be shown that $e^{*} > e$. Although $\eqref{2}$ is biased it has lower MSE.

Sample variance formula vs. Population variance formula usage

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in STATISTICAL-INFERENCE

Related Questions in MACHINE-LEARNING

Related Questions in MAXIMUM-LIKELIHOOD

Related Questions in PARAMETER-ESTIMATION

Trending Questions

Popular # Hahtags

Popular Questions