Estimating standard deviation from first differences

332 Views Asked by At

This is a problem from work. Given a sequence of i.i.d. Gaussians $X_i \sim N(0, \sigma)$, I need to estimate $\sigma,$ but the twist is I can only observe the differences in successive samples. I.e. I cannot observe $X_i,$ but rather, I get to observe $Y_i \equiv X_i - X_{i-1}$.

Now of course, each $Y_i \sim N(0, \sqrt{2} \sigma),$ so I can use standard techniques e.g. sample variance of $Y_i$. However, if I understand correctly, those techniques assume the $Y_i$'s are independent, whereas here I would think they are not. (My gut feel: e.g. if all the $Y_i$ so far are pretty small in magnitude and then suddenly $Y_{i+1}$ is very positive, that probably means $X_{i+1}$ is a large positive outlier, and so $Y_{i+2}$ would be very negative.)

I understand there are standard concerns with the standard techniques, e.g. biased vs unbiased, estimating variance vs estimating $\sigma$, etc. My problem came from working with actual data (for which we can't be sure the $X_i$ are Gaussian anyway), so I need, not an exact theoretical answer, but a justifiable answer which is reasonably accurate in practice and where I feel I have a handle on all the issues. After reading some literature, I feel I have a handle on the biased vs unbiased issue and the variance vs $\sigma$ issue. But I have not seen any literature on the issue of the $Y_i$'s being (I think) dependent, and don't know how big an error I might be introducing.

So my question: assuming:

  • $Y_i \equiv X_i - X_{i-1}$ are indeed dependent (i.e. my gut feel is correct),

  • standard techniques e.g. sample variance indeed assume the samples are independent,

then: does using e.g. sample variance introduce a systemic bias, and if so what is the best way to correct it? An exact answer would be great, but if not, some possible approach or literature would also be much appreciated. (Comments on my assumptions above would also be appreciated.)

1

There are 1 best solutions below

0
On BEST ANSWER

Comment continued: I am not sure I understand exactly what you are allowed to observe, so I am not making recommendations. However, based on comments above, which are easily proved analytically, here are results of a simulation in R with a million observations from $\mathsf{Norm}(\mu=100,\sigma=15).$ Autocorrelation plots of the sequence of differences show significant autocorrelation for lag 1; no significant autocorrelations among alternate differences.

m = 10^6; x = rnorm(m+1, 100, 15);  x1 = x[2:m]
d = diff(x);  dh = d[seq(1, m, by=2)]  # differences; alternate differences
mean(x);  sd(x)
[1] 100.0126       # aprx E(X) = 100
[1] 14.99581       # aprx SD(X) = 15
mean(dh); sd(dh)/sqrt(2)  # alternate differences D2, D4, D6, ...
[1] -0.008337449   # aprx E(D) = 0
[1] 15.00438       # aprx SD(D) = 15
par(mfrow=c(1,2))  # autocorrelation function plots
  acf(d);  acf(dh)
par(mfrow(c(1,1)))

enter image description here