Unbiased estimator for the sum of numbers

690 Views Asked by At

Let $\alpha_1, \dots, \alpha_n \in \mathbb{R}$. We want to approximate the sum as follows $$ S = \sum_{i=1}^{n} \alpha_i \approx \dfrac{n}{c} \sum_{i=1}^{c} \alpha_i, $$ where $\alpha_i$ is picked with probability $$ p_i = \dfrac{\alpha_i}{\sum_{i=1}^{n}\alpha_i}. $$ Some lecture notes I am reading say that this is an unbiased estimator, which seems reasonable but I am unble to prove. If I have understood correctly, I should show that the arithmetic mean of the estimator $$ \dfrac{n}{c} \sum_{i=1}^{c} \alpha_i $$ over all the possible choices of the $c$ indices, is $S$. I think I might use some kind of combinatorial trick, but I have no clue on a feasible way of enumerating all the possible combinations $c$ by $c$ of the $\alpha_i$'s, and any help would be much appreciated.

EDIT: I have added a few necessary details, as pointed out in some of the comments below.

1

There are 1 best solutions below

0
On BEST ANSWER

The notation is rather bad, because it implies that we always sum the first $c$ numbers, not a random set. If I understand it right, we have

$$\hat S= \frac{n}{c} \sum_{j\in T} \alpha_j$$ where $T \subset \{1,2 \cdots n\}$, $|T|=c$, is a random subset. We could prescribe that the size $c$ is fixed, or that it is itself random.

We intend $\hat S$ to be an estimator of $S=\sum_{i=1}^n \alpha_i$

An alternative way of putting is using an indicator random tuple ${\bf X}$ with $x_i=1$ if $i\in T$ ($\alpha_i$ is selected), $0$ otherwise. Then

$$\hat S = \frac{n}{c} \sum_{i=1}^n {\bf x_i } \, \alpha_i$$ The estimator will be unbiased then iff $$ E(\hat S)=\frac{n}{c} \sum_{i=1}^n p_i \, \alpha_i = \sum_{i=1}^n \alpha_i $$ where $p_i = E(x_i)$. This depends on how the set is selected - or the probability distribution of $X$ - we only have the restriction of $\sum p_i=c$. (this is also wrong in the original question - these probabilities should not be normalized)

If we select subsets of size $c$ with uniform probability, then $E(x_i) = c/n$ and the above is true. If instead the elements are selected with probabilities proportional to its values, we should have $p_i = \alpha_i \, c/S$ (this involves the quantity we want to estimate; difficult to say if this makes practical sense).

We could also try to compute (and perhaps minimize) the variance of the estimator, but I guess that this would be feasible only if we assume that the elements $x_i$ are independent (and hence $c$ is random).