Unbiased estimator of variance

2.3k Views Asked by At

My question is why is the best and most commonly used estimator for the variance (in a Gaussian distribution) the sample variance with constant 1/n-1 when the sample variance with constant 1/n+1 instead has a lower mean squared error?

3

There are 3 best solutions below

0
On

That depends on what you mean by "best". The reason the Bessel correction (the factor of normalization $ n-1 $) is used so frequently is because it's nonparametric (as in, it's unbiased for all distributions, not just normal ones), while a criterion such as minimizing the $ L^2 $ error of the estimator requires some parametric knowledge of the distribution - specifically, it requires knowledge of its kurtosis.

In addition, minimizing the $ L^2 $ error may simply not be what you're interested in for a specific application. If you know in advance that you're dealing with a normal distribution (which always has kurtosis $ 3 $), then using a factor $ n+1 $ indeed minimizes the $ L^2 $ loss, and it can be desirable to do so depending on the objective you have in mind. In this case, the choice of normalization factor is a matter of context and decision.

I'd say, however, that the most likely reason for the prevalence of the Bessel correction in variance estimation is simply that it's what people are told to do as "good practice" in general, so it's what they end up doing in the majority of cases.

0
On

'Best' choice of estimator depends on context and on the purpose of estimation; often 'best' is simply taken to mean 'least variance' or 'least mean squared error (MSE)'. Note that in the class of all estimators, there is no best estimator in the sense of having least MSE for every value of the parameter. So we often confine ourselves to some restricted class of estimators by imposing a criteria like unbiasedness (usually for small sample problem). Now within this restricted class consisting of unbiased estimators, we choose an estimator by minimizing MSE (i.e. minimizing variance).

Thus unbiasedness combined with minimum variance is a popular criteria for choosing estimators. Sample variance with denominator $n-1$ is the minimum variance unbiased estimator of population variance while sampling from a Normal population, which in addition to the point made by @Starfall explains its frequent usage. This estimator is best (in the sense of minimum variance) within the unbiased class.

The other estimator with denominator $n+1$ has a lower MSE, but is not unbiased (although asymptotically unbiased). This estimator is also best in the sense of minimum MSE within the class of estimators of type $c\sum_i(X_i-\overline X)^2$. If for your purpose mean squared error is a more suitable criteria and unbiasedness is not a big deal, then definitely this second estimator is a better choice. Both estimators behave similarly in a large sample problem though, as one might expect.

0
On

Probably the two main reasons for using $S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i = \bar X)^2$ to estimate population $\sigma^2$ from a normal sample are:

  • UMVUE. Sample variance is unbiased, $E(S^2) = \sigma^2.$ and $Var(S^2)$ is smallest among unbiased estimators. [But note that unbiasedness does not survive the nonlinear square root transformation, so $E(S) < \sigma.$ The bias is small and is usually ignored.]

  • Distributional relationships. There are many familiar and convenient distributional relationships using $S^2$ for testing and making confidence intervals. One of these is $\frac{(n-1)S^2}{\sigma^2} \sim \mathsf{Chisq}(\nu = n-1).$ Another is that for two normal samples, $F = S_1^2/S_2^2 \sim \mathsf{F}(\nu_1 = n_1 - 1, \nu_2 = n_2 - 2).$

However, UMVUE is not necessarily the best criterion for an estimator. As another possibility, one may seek an estimator $T$ of parameter $\tau$ that minimizes mean square (MSE), which is $E[(T - \tau)^2] = Var(T) + B_\tau(T)^2,$ where the bias is $B_\tau(T) = E(T-\tau).$

One can show that MSE for estimating $\sigma^2$ is minimized by $\frac{1}{n+1}\sum_{n-1}^n (X_i - \bar X)^2.$

Of course it's not a proof, but the following simple simulation in R illustrates, using a particular normal population, that the denominator $n+1$ gives smaller MSE (actually, RMSE which is the square root of MSE) than $n-1.$ It uses $m = 10^5$ samples of size $n = 6$ from $\mathsf{Norm}(\mu = 10, \sigma=2).$ Estimates are v1 for $S^2,$ v2 for denominator $n,$ and v3 for denominator $n+1.$ However, $V_3$ is not a promising candidate for general use; it seriously underestimates the population variance: $E(V_3) \approx 2.96 < 4 = \sigma^2.$

set.seed(512)
m = 10^5;  n = 6
x = rnorm(m*n, 10, 2)
DTA = matrix(x, nrow=m)  # each row a sample of n
v1 = apply(DTA, 1, var)
rmse1 = sqrt(mean((v1-4)^2)); rmse1
[1] 2.534279
mean(v1)
[1] 4.002054   # aprx E(S^2) = E(V1) = 4
v2 = (n-1)*v1/n
rmse2 = sqrt(mean((v2-4)^2)); rmse2
[1] 2.21411
v3 = (n-1)*v1/(n+1)
rmse3 = sqrt(mean((v3-4)^2)); rmse3
[1] 2.139998   # smallest RMSE
mean(v3)
[1] 2.85861

par(mfrow=c(3,1));  cutp=0:28
 hist(v1, prob=T, br=cutp, xlim=c(0,25), col="skyblue2")
 hist(v2, prob=T, br=cutp, xlim=c(0,25), col="skyblue2")
 hist(v3, prob=T, br=cutp, xlim=c(0,25), col="skyblue2")
par(mfrow=c(1,1))

The figure illustrates that the variance estimate with denominator $n+1$ has the smallest variance among the three estimators.

enter image description here

Note: In case $\mu$ known and $\sigma^2$ is unknown the UMVUE for estimating $\sigma^2$ is $\frac{1}{n}\sum_{i=1}^n (X_i - \mu)^2$ and $\frac{1}{\sigma^2}\sum_{i=1}^n (X_i - \mu)^2 \sim \mathsf{Chisq}(n).$