Variance of Variance Estimation Simulation in Matlab

888 Views Asked by At

I am trying to verify, through numerical simulation, the expression for the variance of the variance estimation, namely:

$$ \text{Var}(s^2) = \frac{2}{n \, \sigma^4} $$

where $n$ is the number of samples, and $\sigma$ is the standard deviation of the process (assumed Gaussian), but something is wrong. (For the source of this expression, see http://www.statlect.com/variance_estimation.htm/)

The numerical simulation is quite simple, as shown in the Matlab code below:

% number of Montecarlo runs
NMC = 10000;
% number of samples at each run
n = 100;

% true standard deviation of the random process
stdan = 0.1;
stdMC = [];

% Montecarlo simulation
for j=1:NMC
    % generate n samples of the random process  
    % a white noise with stdan standard deviation and zero mean
    y = stdan*randn(n, 1);

    % estimate the standard deviation based on the n samples of this run. 
    std_i = std(y);

    % store it
    stdMC = cat(1, stdMC, std_i);
end

% compute the estimation error (difference between the estimated standard deviation at each run and the true one)
estd = stdMC - repmat(stdan, NMC, 1);

 % compute the expected variance and standard deviation of the estimator
estd_var_an = (2/n)*stdan^4
estd_std_an = sqrt(estd_var_an)

% compute the actual estimated variance of the estimations
estd_var = var(estd)
estd_std = std(estd)

But it returns

estd_var_an =
   2.0000e-06
estd_std_an =
    0.0014
estd_var =
   5.1535e-05
estd_std =
    0.0072

Showing that the simulated variance estd_var of the estimator is totally different (25 larger) than the true one estd_var_an.

See also: Variance of sample variance?

2

There are 2 best solutions below

3
On BEST ANSWER

Let $X_1, X_2, \dots, X_n$ be a random sample from $Norm(\mu, \sigma)$. As usual, define $$S^2 = \frac{\sum_{i=1}^n (X_i - \bar X)^2}{n-1}.$$ Then the sample variance $S^2$ unbiased; that is $E(S^2) = \sigma^2$. Moreover, $(n - 1)S^2/\sigma^2 \sim Chisq(df = n-1).$ Because the variance of this distribution is $2(n-1),$ we have $V(S^2) = 2\sigma^4/(n-1).$ Of course, by definition $V(S^2) = E[(S^2 - \sigma^2)^2].$

In practice, $\mu$ is rarely known. But if it is, one can estimate $\sigma^2$ with $S_\mu^2 = (1/n)\sum(X_i - \mu)^2.$ Then, similarly $V(S_\mu^2) = 2\sigma^4/n.$

I am more familiar with R, so I will use it to demonstrate the first case in which $\mu$ is unknown, and leave it to you to figure out the MatLab code and the conversion to the less commonly used case (if that is really what you intend).

Specifically, I use $n = 25,$ $\mu = 50$ and $\sigma = 10$ for the samples, and simulate $m = 100\,000$ samples. Simulated values are $E(S^2) \approx 99.98$ and $V(S^2) \approx 831,$ which are within simulation error of the exact values 100, and 833.33, respectively.

 m = 10^5;  n = 25;  mu = 50;  sg = 10
 x = rnorm(n*m, mu, sg)
 DTA = matrix(x, nrow=m)  # each row a sample of n
 v = apply(DTA, 1, var)   # vector of m sample variances (dim 1 for rows)
 mean(v)
 ## 99.98174
 var(v)
 ## 830.7297
 2*sg^4/(n-1)
 ## 833.3333
 mean((v-sg^2)^2)
 ## 830.7217
0
On

In line

estd_var = var(estd)

I was computing the variance of the standard deviation, and not the variance of the variance.

Once again a reminder that moving back and forth between variance and standard deviation leads to errors. Stick with variance.