estimate population sum given population size and a sample mean and variance

114 Views Asked by At

I have a question which seems easy but I'm not sure how to solve it.

Assume we have a population of $N$ integer numbers, $x_1, x_2, ..., x_N$. So $N$ is population size. I'm interested in the sum of these numbers (i.e. $S = \sum_i^N x_i$), however, $N$ is large so I'm going to take $K$ samples randomly (uniform) and compute the sample sum as $m$ and the sample variance as $\sigma^2$.

What is the most accurate estimate (with bounds/confidence-interval/expected-deviation, etc.) we can have for $S$ given $m$, $\sigma^2$, $K$, and $N$?

The important thing is that I want to use all of this information (i.e., $K$, and $N$ and $m$ and $\sigma^2$) to make the estimate more accurate with smaller bounds. I don't have any assumptions about the population distribution.

Update: Since no one has answered this, I'm going to explain the background: What I actually want to do is to count the total number of words with a particular characteristic in a book with $N$ (e.g. 3000) pages. It's really time-consuming to find them so I plan to count such words only in $K$ (e.g. 30) pages. Given these $K$ numbers, I want to find the total number of such words in the book. I assume the answer will be the sum of these numbers scaled by $N/K$ but what are the error bounds?