Suppose I have a distribution $p(x; \mu, \sigma)$ with finite standard deviation and mean. My goal is to approximate $\sigma$. The problem is that, in my application, I can generate $N$ 'instances' of $p$ with fluctuating mean $\mu_i$ around the true mean $\mu$ with standard deviation $\sigma_{\mu}$. Meaning the function $p$ is exactly the same across instances, except for a shift. From each instance $p(x, \mu_i, \sigma)$, I can only sample a very small number of points $n$.
The fluctuating mean causes $\sigma$ to be overestimated (as a function of $\sigma_{\mu}$) when I simply concatenate all my data points and calculate $$ \sigma^2 = \frac{1}{Nn - 1} \sum_i^{Nn}(x_i - \bar{x})^2,$$ with $$ \bar{x} = \frac{1}{Nn}\sum_i^{Nn} x_i.$$ Taking the average of the standard deviations of each instance $$ \sigma = \frac{1}{N}\sum_{i=1}^{N}\sqrt{\frac{1}{n-1}\sum_{j=1}^{n}(x_j - \bar{x}_j)^2}$$ significantly underestimates $\sigma$ because $n$ is so small (independent of $\sigma_{\mu}$).
To illustrate: the following MATLAB code
trials = 100000;
n = 4;
sigma = 1;
sigmaMean = 0.5 * sigma;
total_a = zeros(n, trials);
a_stds = zeros(1, trials);
for i = 1:trials
mu = normrnd(0, sigmaMean);
a = normrnd(mu, sigma, [1 n]);
total_a(:, i) = a;
a_stds(i) = std(a);
end
meanStd = mean(a_stds);
totalStd = std(total_a(:));
meanA = mean(total_a(:));
mydisp('mean std: %g. std total a: %g, mean a: %g', meanStd, totalStd, meanA)
gives me
mean std: 0.92144. std total a: 1.11657, mean a: -0.000814756
Is there a way to properly find $\sigma$ given this experiment?
I would try de-meaning every $n$-size subsample, then calculate the total SD. In detail:
Index the subsamples by $j=1,2,3,..$, where subsample $j=1$ consists of the first $1,...,n$ observations, $j=2$ consists of observations $n+1,...,2n$ etc.
First, calculate the mean of each subsample: $$\bar{x}^j := \frac{1}{n}\sum_{i=(j-1)n+1}^{jn} x_i.$$
Then estimate the standard deviation by using the appropriate mean for each subsample: $$\hat{\sigma}=\sqrt{\frac{1}{N(n-1)}\sum_{j=1}^N \sum_{i=(j-1)n+1}^{jn} (\bar{x}^j-x_i)^2}. $$ This SD estimate should be smaller than your total SD estimate, as it takes into account that each subsample has a different mean, and that mean has to be estimated first so your SD estimate should be adjusted. This adjustment is why we divide by $N(n-1)$ rather than $Nn$.