The expected weight-ratio between weighted and un-weighted balls when picked from a bin without replacement

78 Views Asked by At

The Problem

The problem, I believe, can be stated in the following way: Given $K$ white balls all with without weight (one can say that the weight is $0$) and $N - K$ red balls with individual strictly positive weights $> 0$, if $n \leq N$ balls are picked at uniformly random without replacement, what is the expected weight ratio between the $k$ selected balls' weights and the total sum of all $N-K$ red balls' weights?

Comment: without any knowledge of neither how the weights are distributed among the red balls, nor of how many red balls there are compared to white, I guess that it could be the case that the answer is too general. I have, either way, not been able to come up with the precise algebraic formula for the solution.

My Thoughts & Efforts

I have in my efforts followed two trains of reasoning, none of which if followed could result in the sought after solution. They are mentioned as to point out that my approach has been towards finding ways that could give a rough idea of the bounds of a solution. The calculations are when no reference is found mostly my own calculation, and as such not reviewed by anyone, so consequently both these paths might be and likely are, subject to a high degree of miscalculations. I highly value any input.

The Heaviest Ball

(I) - Firstly, the heaviest ball is expected to be one of the first $\lceil {log\ N} \rceil + 1$ visited elements. This I believe can be shown by letting $X$ denote the random variable such that $X_i = 1$ for $1 \leq i \leq N$ represents the event of the heaviest ball being the $i:th$ ball being evaluated, and $X_i = 0$ of it not being the heaviest. For any specific $j$ the expected value of the heaviest being evaluated can be written as in (1). \begin{equation} \mathbb E[X_i] = 0 \times P[X_i=0] + 1 \times P[X_i=1] = P[X_i=1] \quad \quad (1) \end{equation} If we assume that all such events are mutually independent and that the balls are picked without replacement, then the probability of $P[X_i=1]$ can be written as (2). \begin{equation}\label{fun:fun_ct} P[X_i=1] = \frac{1}{N+1-i} \quad \quad (2) \end{equation} The expected number of tries before an interval from the optimal solution follows from linearity of expectation over the random variables. \begin{align*} \mathbb E[X] &= \sum_{i=1}^n P[X_i]\\ &= \sum_{i=1}^N (\frac{1}{N+1-i}) \\ &< \sum_{i=1}^N \frac{1}{i} \\ &\leq log(N) + 1 \end{align*}

From the Wikipedia page on harmonic series [1] the following relation is stated $\sum_{i=1}^{N} \frac{1}{i} \leq log(N) + 1$ which leads me to believe that if $n \geq log(N) + 1$ the, searched for, expected weight ratio should be bounded by $\frac{1}{N-K}$. If this process is repeated for rounds $r := \lceil \frac{n}{\log(N) + 1} \rceil$ wherein for each round $n$ balls are picked, then by linearity of expectation, should we not receive a weight ratio bounded from above(?) by $\frac{r}{N-K}$?

The Heaviest m Balls

(II) - Secondly, with a arguably severe abuse of terminology, I say that the expected number of red balls after picking $n$ follows from the hypergeometric distribution such that $Y \sim Hypergeometric(N, N-K, n)$ and could, therefore, be expressed, by following the example in [2], algebraically as the expression $\mathbb E[Y] = \sigma n$ where $\sigma := \frac{N-K}{N}$.

If one re-colors the $m$ heaviest red balls green such that $m < n$, then by the same reasoning the expected number of green balls after picking $n$ balls, should be $\hat{\sigma} := \frac{N-m}{N}$. The expected ratio should, therefore, be bounded from below by \begin{equation} L := \frac{\sum_{j=1}^{\hat{\sigma} n} \alpha^{j} + \sum_{j=\hat{\sigma}n +1 }^{\sigma n} \beta^{j}} {\sum_{j=1}^{N-K}\alpha^{j}} \end{equation} where $\beta^j$ is the weight of $j:th$ lightest ball, and from above by \begin{equation} U_1 := \frac{\sum_{j=1}^{\sigma n} \alpha^{j}}{\sum_{j=1}^{N-K}\alpha^{j}} \end{equation} where $\alpha^j$ is the weight of the $j:th$ heaviest ball.

Comment: we have for the upper-bound here settled with the sequence of the heaviest weights, but if one could compute the expected number of index between $n\sigma$ and $n\hat{\sigma}$ and denote it $\mathbb E[I]$, then the upper-bound can be tightened to \begin{equation} U_2 := \frac{\sum_{j=1}^{\hat{\sigma} n} \alpha^{j} + \sum_{j=\hat{\sigma}n + \mathbb E[I]}^{\sigma n + \mathbb E[I]} \alpha^{j}} {\sum_{j=1}^{N-K}\alpha^{j}} \end{equation} It might actually be that case that $\mathbb E[I] = \mathbb E[X]$ I guess, and if that is maybe we could have \begin{equation} U_3 := \frac{\sum_{j=1}^{\hat{\sigma} n} \alpha^{j} + \sum_{j=1}^{\sigma n - \hat{\sigma}n} \alpha^{j(log(N))}} {\sum_{j=1}^{N-K}\alpha^{j}} \end{equation}

Conclusion

Finally, my conclusion is that the expected ratio $\mathbb E[Z]$ must be in-between \begin{equation} L \leq \mathbb E[Z] \leq U_3 \leq U_2 \leq U_1 \end{equation} where $0 \leq Z \leq 1$ is a continuous random variable expressing the ratio in question.

1

There are 1 best solutions below

2
On BEST ANSWER

Perhaps I'm misunderstanding the problem, but it seems quite simple to me.

We don't even need to make the distintion between red and white (weightless) balls.

Let $\alpha_i$ ($i=1,2 \cdots N$) be the weights of the $N$ balls. We know (actually this will be irrelevant for us) that $\alpha_i>0$ if $i\le N-K$ (red balls) and $\alpha_i=0$ if $i> N-K$ (white balls).

We pick $n<N$ balls at random, uniformly, without replacement. Let $C_i\in \{0,1\}$ , $i=1,\cdots N$ be a indicator random variable, taking value $1$ if ball number $i$ was picked, $C_i=0$ otherwise. We are interested in the expected value of

$$X=\frac{\sum_{i=1}^N C_i \alpha_i}{\sum_{i=1}^N \alpha_i} \tag1$$

Then, because the denominator is a constant, and because expectation is linear: $$E[X]=\frac{\sum_{i=1}^N E[C_i] \, \alpha_i}{\sum_{i=1}^N \alpha_i} = \frac{\sum_{i=1}^N \frac{n}{N} \, \alpha_i}{\sum_{i=1}^N \alpha_i} = \frac{n}{N} \tag2$$