Confidence interval of population standard deviation using measurements with uncertainties

181 Views Asked by At

Let's say I have a sample of N measurements which I use to calculate the standard deviation of said sample:

$s=\sqrt{\frac{\sum{(x-\bar{x})^2}}{N-1}}$

I can use this value to place a 68% confidence interval on the population standard deviation:

$P(\sqrt{\frac{ks^2}{q_{0.84}}}<\sigma<\sqrt{\frac{ks^2}{q_{0.16}}})=0.68$,

where the coefficients of the confidence interval are calculated using the appropriate chi-squared distribution.

However, my individual measurements have uncertainties, which means that $s$ itself has an associated error (I am using the formulation from Ahn & Fessler 2003 to calculate it, although the choice is not critical here).

My question is: How do I incorporate the error on $s$ into the calculation of the confidence interval for the population standard deviation?

Thank you very much for your help

1

There are 1 best solutions below

0
On

Here is an example of one kind of bootstrap CI for an unknown population standard deviation. Suppose I have $n=100$ observations.

To get data for a demonstration, I sample from a t distribution with $\nu = 12$ DF, hence $\sigma^2 = \nu/(\nu-2) = 12/10 = 1.2,$ $\sigma = \sqrt{1.2} = 1.0955.$ [You would substitute your data for my x. Presumably, your data are modeled to be nearly but not exactly normal, as mentioned in your question.]

set.seed(1114)
x = rt(100, 12)
s.obs = sd(x);  s.obs
[1] 1.102585

set.seed(2020)
d.re = replicate(5000, sd(sample(x,100,rep=T))/s.obs)
UL = quantile(d.re, c(.975,.025))
s.obs/UL
     97.5%      2.5% 
 0.9659272 1.3009683 

Notice that the 95% nonparametric bootstrap CI $(0.966, 1.301)$ contains the observed SD $S_{obs} = 1.1026$ of the data x.

Brief rationale for bootstrap code: If we knew the distribution of $S/\sigma$ then we could get $L$ and $U$ such that $P\left(L \le \frac{S}{\sigma}\le U\right) = 0.95.$ Then a 95% CI for $\sigma$ would be of the form $\left(\frac{S}{U}, \frac{S}{L}\right).$

In order to approximate the unknown distribution of $S/\sigma$ we enter the 'bootstrap world', temporarily using s.obs $(S_{obs})$ as a proxy for $\sigma.$ We take $B = 5000$ samples of size $n=100$ with replacement from x. This is called re-sampling. Then we obtain standard deviations $S^*$ of these samples and divide by s.obs to get an idea of the unknown distribution of $D^*\approx S/\sigma.$ We take upper and lower quantiles $U^*, L^*$ of the simulated distribution of $D^*$s.

Then returning s.obs to its original role as our observerd sample SD, we obtain the 95% nonparametric bootstrap CI of the form $\left(\frac{S_{obs}}{U^*},\frac{S_{obs}}{L^*}\right).$ In the R code the suffix .re replaces the re-sampled quantities denoted above by $*$s.

This is called a nonparametric bootstrap because the bootstrap procedure has not assumed that the data were sampled from a normal distribution (or another distribution of known type). We have assumed only that our $n=100$ observations are randomly sampled from some distribution for which the standard deviation exists.