Unbiased Variance Estimator for Stratified Two Phase Simple Random Sampling without Replacement

44 Views Asked by At

In the context of two-phase sampling for stratification (see Särndall's Model Assisted Survey Sampling pg 350), in the particular case in which we use simple random sampling without replacement (srswor) in each phase, we can check with no issue that the variance of the $\widehat{Y}_U^{\text{strHT}^*}$ estimator is given by \begin{equation} V=KS_{yU}^2+V_2\text{, } \end{equation} where $S_{yU}^2=\frac{1}{N-1}\sum_{k\in U}(y_k-\bar{y_U})^2$, $K$ is a constant that we don't care about. We may also ignore $V_2$.

We want to find and unbiased estimator $\widehat{V}$ for $V$, and in order to do that, we proceed by finding an unbiased estimator for each term of the sum, which we can easily do for $V_2$, but not for $S_{yU}^2$.

Note that, for $X$ to be unbiased for $S_{yU}^2$ we have to prove that \begin{equation} S_{yU}^2=E(X)=E(E(X|s_a))\text{, } \end{equation} where $s_a$ is the sample obtained in the first phase.

We know that $E(S_{ys_a}^2)=S_{yU}^2$, so we just need and $X$ such that $E(X|s_a)=S_{ys_a}^2$.

The first phase sample $s_a$ is partitioned into $H_{s_a}$ strata $(s_{ah})_{h=1}^{H_{s_a}}$, so we can decompose $S_{ys_a}^2$ into it's "between" and "within" components, namely \begin{equation} S_{ys_a}^2=\sum_{h=1}^{H_{s_a}}\frac{n_{s_{ah}}-1}{n_{s_a}-1}S_{ys_{ah}}^2+\sum_{h=1}^{H_{s_a}}\frac{n_{s_{ah}}}{n_{s_a}-1}(\bar{y}_{s_{ah}}-\bar{y}_{s_a})^2\text{, } \end{equation} where $n_{s_a}$ is the size of the first phase sample $s_a$ and $n_{s_{ah}}$ is the number of elements of $s_a$ that live in the $h$-th stratum.

As far as I know, $S_{ys_h}^2$, where $s_h$ is the part of the second phase sample living in the $h$-th stratum, is an unbiased estimator of $S_{ys_{ah}}^2$ (given $s_a$), and it's easy to find unbiased estimators of $\bar{y}_{s_{ah}}$ and $\bar{y}_{s_a}$ too. Therefore, carefully substituting we obtain our desired unbiased $S_{yU}^2$ estimator \begin{equation} S_{ys_a}^2=\sum_{h=1}^{H_{s_a}}\frac{n_{s_{ah}}-1}{n_{s_a}-1}S_{ys_h}^2+\frac{n_{s_a}}{n_{s_a}-1}\sum_{h=1}^{H_{s_a}}w_{ah}(\bar{y}_{s_{h}}-\widehat{\bar{y}}_{U})^2\text{. } \end{equation}

Don't worry about the last sum, as it matches the book's expression, however, according to the book, the coefficients multiplying the terms of the first sum should be \begin{equation} w_{ah}(1-\delta_h)\text{, } \end{equation} where $w_{ah}=\frac{n_{s_{ah}}}{n_{s_a}}$ and $\delta_h=\frac{1}{n_{s_h}}(\frac{n_{s_a}-n_{s_{ah}}}{n_{s_a}-1})$ and $n_h$ is the number of elements in the second phase sampling lying in the $h$-th stratum.

Any idea?