probability of error in sampling to estimate sum of a population

72 Views Asked by At

Given non-negative numbers $$\{m_1, m_2,\dots,m_n\}$$ we have to estimate the sum $$s = \sum_{i=1}^nm_i$$ using sampling (with replacement). If we sample k numbers uniformly at random, then I can estimate the error in the sum being off by some amount t, but I have a problem where the probability of picking the i-th number is proportional to it, i.e. $$Pr[\text{selecting }m_i] = \frac{m_i}{s}$$

Here is what I have done so far: Let us sample k numbers with replacement. We pick numbers $$m_{i_1}, m_{i_2}, \dots, m_{i_k}$$ The sum of these, $$M = \sum_{j=1}^km_{i_j} \\E[M] = \sum_{j=1}^kE[m_{i_j}] = \sum_{j=1}^k[\sum_{m_i}Pr[m_{i_j} = m_i]m_i] = \sum_{j=1}^k[\sum_{l=1}^n\frac{m_l}{s}m_l] = \sum_{j=1}^k\sum_{l=1}^n\frac{m_l^2}{s}$$ Here I get stuck as to how to apply any concentration bound like Hoeffding's or any other to a statement like $$Pr[|\frac{n}{k}\sum_{j=1}^km_{i_j} - s| \geq y] \leq ?$$ As the estimator from the sample for the sum of the population would be $$\frac{n}{k}\sum_{j=1}^km_{i_j}$$

1

There are 1 best solutions below

3
On

You can’t get a concentration result here because

$$ E\left[\frac nkM\right]\ne s\;. $$

For instance, for $m_1=1$, $m_2=2$ and $k=1$, we have $s=3$ but

$$ E\left[\frac nkM\right]=\frac21\cdot\frac13\left(1^2+2^2\right)=\frac{10}3\;. $$