Variance Estimator in Simple Random Sampling Without Replacement

1.7k Views Asked by At

I have to find the unbiased estimator of population variance under simple random sampling without replacement.

The hint for the demonstration is: $$ \frac{1}{N} \sum_{k =1}^{N} (x_{k} - \bar{x_{U}})^2 = \frac{1}{2N^2} \sum_{k =1}^{N}\sum_\underset{\Large{l\neq k}}{l=1}^{N} (x_{k} - \bar{x_{l}})^2 $$

I start like this, but I don't know if this is right:

$\implies \frac{1}{2N^2}\sum_\underset{\Large{l\neq k}}{l=1}^{N} (x_{k}- \bar{x_{l}})^2$

$\implies \frac{1}{2N^2}\sum_\underset{\Large{l\neq k}}{l=1}^{N} (x_{k}^2 - 2x_{l}x_{k} + \bar{x_{l}}^2)$

$\implies \frac{1}{2N^2}\sum_\underset{\Large{l\neq k}}{l=1}^{N} x_{k}^2 -\sum_\underset{\Large{l\neq k}}{l=1}^{N}2\bar{x_{l}}x_{k} +\sum_\underset{\Large{l\neq k}}{l=1}^{N}\bar{x_{l}}^2$

Here I am stuck.

1

There are 1 best solutions below

0
On BEST ANSWER

Let $(y_1,...,y_n)$ be a simple random sample without replacement from population $(x_1,...,x_N).$ Then the population mean and variance are, respectively, $$\begin{align}\mu:&={1\over N}\sum_{i=1}^Nx_i\\ \sigma^2:&={1\over N}\sum_{i=1}^N(x_i-\mu)^2.\end{align}$$ Following is a sketch of how to show that $$\begin{align}E\left({N-1\over N}{1\over n-1}\sum_{i=1}^n(y_i-\bar{y})^2\right)=\sigma^2.\end{align}$$


Aside: Some authors differ on the definition of "population variance", taking it to be the quantity $$S^2:={N\over N-1}\sigma^2= {1\over N-1}\sum_{i=1}^N(x_i-\mu)^2,$$ presumably to allow the above unbiasedness result to be written as follows:

$$\begin{align}E\left({1\over n-1}\sum_{i=1}^n(y_i-\bar{y})^2\right)=S^2.\end{align}$$


By the OP's identity (as originally posted, which is proved here),

$$\begin{align}E\left(\frac{1}{n} \sum_{i =1}^{n} (y_{i} - \bar{y})^2\right) &= \frac{1}{2n^2} \sum_{i =1}^{n}\sum_\underset{\Large{j\neq i}}{j=1}^{n} E(y_i - y_j)^2\\ &={1\over 2n^2} n(n-1)E(y_1-y_2)^2\\ &={1\over 2n^2} n(n-1)E\left((y_1-\mu)-(y_2-\mu)\right)^2\\ &={1\over 2n^2} n(n-1)E\left((y_1-\mu)^2+(y_2-\mu)^2-2(y_1-\mu)(y_2-\mu)\right)\\ &={1\over 2n^2} n(n-1)\,2(\sigma^2-\text{cov}(y_1,y_2))\\ &={1\over 2n^2} n(n-1)\,2(\sigma^2-(-{\sigma^2\over N-1}))\\[2ex] &={n-1\over n}{N\over N-1}\sigma^2. \quad\quad\quad\quad\quad\quad\quad\quad\text{QED}\end{align}$$ In the above, the covariance term is obtained as follows, because each of the $N(N-1)$ possible outcomes for $(y_1-\mu)(y_2-\mu)$ is equally likely: $$\begin{align}\text{cov}(y_1,y_2) &=E\left((y_1-\mu)(y_2-\mu)\right)\\ &=\frac{1}{N(N-1)} \sum_{i =1}^{N}\sum_\underset{\Large{j\neq i}}{j=1}^{N} (x_i-\mu)(x_j-\mu)\\ &=\frac{1}{N(N-1)} (-N\sigma^2)\\ &=-{\sigma^2\over N-1} \end{align}$$ where we have used $$\sum_{i =1}^{N}\sum_\underset{\Large{j\neq i}}{j=1}^{N} (x_i-\mu)(x_j-\mu)=-N\sigma^2$$ which is a consequence of the following identity: $$\begin{align}0^2=\left(\sum_{i=1}^N(x_i-\mu)\right)^2 &=\sum_{i=1}^N(x_i-\mu)^2 + \sum_{i =1}^{N}\sum_\underset{\Large{j\neq i}}{j=1}^{N} (x_i-\mu)(x_j-\mu)\tag{*}\\ &=N\sigma^2 + \sum_{i =1}^{N}\sum_\underset{\Large{j\neq i}}{j=1}^{N} (x_i-\mu)(x_j-\mu).\end{align}$$

Note that (*) is just a special case (with $z_i=x_i-\mu$, so $\sum z_i=0$) of the general identity $$\left(\sum_{i=1}^N z_i\right)^2 =\sum_{i=1}^Nz_i^2 + \sum_{i =1}^{N}\sum_\underset{\Large{j\neq i}}{j=1}^{N}z_iz_j. $$

Sources:

http://dept.stat.lsa.umich.edu/~moulib/sampling.pdf https://issuu.com/patrickho77/docs/mth_432a_-_introduction_to_sampling