What's the difference between 'standard error' and 'estimated standard error'?

324 Views Asked by At

100 people are given a standard antibiotic to treat an infection and another 100 are given a new antibiotic. In the first group, 90 people recover; in the second group, 80 people recover. Let $p_1$ be the probability of recovery under the standard treatment and let $p_2$ be the probability of recovery under the new treatment. We are interested in estimating $θ=p_1-p_2$. Provide an estimate, an 95 percent confidence interval for θ. (for 95 percent confidence interval, $z_{0.5/2}$=2)

I got problem like this, since estimator is $\hat{\theta} =\hat{p_1} -\hat{p_2} = 0.9-0.8 = 0.1$. $\hat{se} $ is 0.05. I thought I solved the problem.

but what bothers me, someone said me that $p_1$, and $\hat{p_1}$ are the same. $se$ and $\hat{se}$(estimated standard error) are the same thing, What I thought til that day was I couldn't get real probability $p_1$ and $p_2$(since I can't know what distribution they follow), so get $\hat{p_1}$, and $\hat{p_2}$, and get $\hat{se}$.

Do I miss some concept?

1

There are 1 best solutions below

0
On BEST ANSWER

First, $\hat\theta = \hat p_1 - \hat p_2 = \frac{90}{100} = \frac{80}{100} = 0.9 - 0.8 = 0.1.$

Then, $V(\hat p_1) = \frac{p_1(1-p_1)}{n},$ so $\hat V(\hat p_1) =\frac{\hat p_1(1-\hat p_1)}{n}.$ Similarly, $\hat V(\hat p_2) =\frac{\hat p_2(1-\hat p_2)}{n}.$ By independence, $V(\hat\theta) = V(\hat p_1) + V(\hat p_2),$ so $$\hat V(\hat\theta) = \frac{\hat p_1(1-\hat p_1)}{n}+ \frac{\hat p_2(1-\hat p_2)}{n}$$ and $$\widehat {SE}(\hat\theta) = \sqrt{\frac{\hat p_1(1-\hat p_1)}{n}+ \frac{\hat p_2(1-\hat p_2)}{n}}.$$

The theoretical standard error of $\hat \theta$ would use $p_i, i=1,2,$ but we do not know the true values of the $p_i.$ So we must use the estimated standard error $\widehat{SE}(\hat\theta)$ (with $\hat p_i$s) instead.

Thus a 95% confidence interval for $\theta$ is $\hat \theta \pm 1.96 \widehat{SE}(\hat \theta).$

Notice that two approximations are involved: (1) assuming that $\hat\theta$ is approximately normal, justifying the use of $1.96$ and (2) using the estimated standard error instead of the unknown theoretical standard error.

Thus the confidence interval must be regarded as valid for large $n$ and not necessarily for small or moderate $n.$ I wonder if $n=100$ is large enough for the "95%" confidence interval to have the promised 95% coverage probability.

In making confidence intervals for a single binomial proportion $p,$ it has become standard practice to use Agresti-Coull CIs, which use estimates that "append 2 successes and 2 failures" to the data. (This amounts to a 'trick' that closely matches a more complicated and more accurate form of CI.)

To get a CI for $\theta$ in the current case, it might be better to use $\tilde p_1 = 92/104$ instead of $\hat p_1,$ to use $\tilde p_2 = 82/104$ instead of $\hat p_2,$ and to use denominators 104 instead of 100 in $\widehat{SE}(\hat \theta).$

For the standard formula above I get the CI $(0.002, .0.198),$ same as you got, and for the adjusted CI I have $(-0.0035, 0.1958).$

A basic parametric 95% bootstrap CI is $(0.00, 0.20),$ as shown below. There is not much difference among the results from the three methods.

set.seed(2020)
P1 = rbinom(10^4, 100, .9)/100
P2 = rbinom(10^4, 100, .8)/100
TH = P1 - P2
q = quantile(TH, c(.025,.975)); q
 2.5% 97.5% 
  0.0   0.2 

hist(TH, prob=T, col="skyblue2")
  abline(v = q, col="red")
  

enter image description here