Expect number of samples, without replacement, from a set such that the sum of sampled elements exceeds a threshold.

35 Views Asked by At

I apologize in advance if this question was already asked; I just could not find it!

Say I have the set, C, such that:

$$C=\{p | p_i\geq0;\sum_{i=0}^n p_i=1\}$$

A simple example, I let C={0, 0, 0, 0.1, 0.1, 0.1, 0.3, 0.4}. Here, n=8.

The entries were generated by selecting each value as a real number uniformly in the range $[0,1]$, summing them, and dividing by that sum. In this way normalization is guaranteed.

I am trying to solve the question of what is the expected number of elements I need to sample, without replacement, from C prior to the sum of the sampled elements exceeding a set threshold (e.g. 0.4).

Presently, I am running simulations to determine the expected number of samples, where I randomly choose elements and remove elements from C until I meet the threshold. For the example above, my simulation estimates it takes an average of 3.3 samples, without replacement, prior to the sum of sampled elements exceeding 0.4. The standard deviation is 1.6. The mean and standard deviation reflect results 1000 simulations.

Issues arise when C becomes more sparse and n becomes big, there is a lot of variance which requires more iterations to get an accurate estimate of the mean. I am looking for an analytical solution to this problem. This strikes me as being a coupon-collector like problem.

Thank you!