Expected number of times an object is picked from a randomly ordered list of distinct objects

233 Views Asked by At

There are $x$ subscribers to a weekly reading list which contains $y$ distinct articles. The subscribers are sent the exact same list of articles but each subscriber receives the weekly list in random order.

Each article $y_i$ in the list has an estimated reading time $r_i$ associated with it.

It is estimated that, on average, a subscriber reads the articles in the order listed on the her/his list and spends $z$ minutes per week reading the articles on the list, where $z < r_1 + r_2 + \dots + r_y$.

Let $y_e$ be an article in the weekly list. What's the shortest way to compute the expected value of the number of subscribers who get to read article $y_e$?

2

There are 2 best solutions below

0
On

Define a permutation $\pi_k(.): \{1,2,\cdots , y\} \to \{\pi_k(1),\pi_k(2),\cdots , \pi_k(y)\}$. There are $y!$ such permutations of the $y$ elements in the list.

Further, define the usual Heaviside function $\Theta(x) = \{1 (x > 0) ; 0 $ (else) $\}$. Assuming all permutations are equally probable, the probability that article $y_e$ is being read is $$ P(y_e) = \frac{1}{y!}\sum_{k=1}^{y!}\Theta(z - (r_{\pi_k(1)}+r_{\pi_k(2)}+\cdots + r_{\pi_k(y_e)})) $$
Note that the number of summands of the sum inside the argument varies, according to the specific permutation: if $\pi_k(y_e) = 1$, the sum has only one element. If $\pi_k(y_e) = y$, the sum has $y$ elements.

The expected number of subscribers who read article $y_e$ is $x \cdot P(y_e)$.

0
On
  1. If you choose a permutation of magazines uniformly at random, then compute the total reading time of the magazines that occur before $y_e$ in the list, you get the same distribution as if you choose an unordered subset of magazines uniformly at random ("the set of magazines before $y_e$") and compute their total reading time.

  2. Therefore, the question "What's the probability that the reader gets to magazine $y_e$?" is the same as "What's the probability that a randomly-chosen subset of magazines (excluding $y_e$) has a sum less than $z$?".

  3. This is a computationally straightforward, if tedious, counting procedure to perform. Because the reading times can differ arbitrarily, we just have to count the number of subsets whose sum is less than $z$ and divide it by the total number of possible subsets $(2^{y-1})$.

    Here's an efficient algorithm to count the subsets whose sum is too large. Given a list $[r_1,\ldots, r_k]$ of positive numbers ordered from largest to smallest (these are the reading times of all magazines except $y_e$), define $f([r_1,\ldots, r_k], z)$ to be the number of subsets of $[r_1,\ldots, r_k]$ whose sum is greater than $z$. We can define $f$ recursively by $$f(\varnothing, z) = 1\text{, if }z\leq 0\text{ else }0.$$ $$f([r_1,\ldots, r_k], z) = 2^k-1 \quad \text{ if }z\leq 0$$ $$f([r_1,\ldots, r_k], z) = f([r_2,\ldots,r_k], z-r_1) + f([r_2,\ldots, r_k], z)$$

    This algorithm is: try creating subsets, starting from the largest elements. If the sum gets too big, stop and try starting from a smaller element.

  4. Once you can count the number of subsets that are too large, the expected number of readers who reach $y_e$ is just the number of readers times the probability that a random reader will reach $y_e$:

    $$E = \left[1-\frac{f([r_1,\ldots,r_n], z)}{2^{n}}\right] x$$