Just for sake of not having to write this equation out a few times, I will denote the residual sum of squares for a set of data points $X$ to be $$ RSS= \sum_{x\in X} \left(x - \overline{x}\right)^2 $$
Now I understand that there are two different formulas for variance, with sample variance being defined as $$ s^2 = \dfrac{RSS}{n-1} $$ and population variance being $$ \sigma^2 = \dfrac{RSS}{n} $$ for $n$ data points.
Now I understand that we use the sample formula, as the population formula is a biased estimator for the true variance.
Now with that I was listening to one of my machine learning lectures and saw that my professor told us to calculate the variance for Naive Baysian classifier using the population formula since it is proven to be the Maximum Likelyhood Estimator for Gaussian Distribution variance. I also understood this proof, but I was confused about why we would select to use the formula for population variance on our sample when we know it is a biased estimator?
Anyone who could provide some insight would be appreciated!
Item 2 in the list requires more explanation. Let's denote $$ \hat{[\sigma}^{*}]^2 = \frac{1}{k-1}\sum_k (x_k - \bar{x})^2 \tag{1} \label{1}$$ $$ \hat{\sigma}^2 = \frac{1}{k}\sum_k (x_k - \bar{x})^2 \tag{2} \label{2} $$
The MSE for $\eqref{1}$ is $e^{*} = E[(\hat{[\sigma}^{*}]^2 - \sigma^2)^2]$
The MSE for $\eqref{2}$ is $e = E[(\hat{\sigma}^2 - \sigma^2)^2]$
Where $\sigma^2$ is true variance.
It can be shown that $e^{*} > e$. Although $\eqref{2}$ is biased it has lower MSE.