Expected value of an r.v. that counts the number of pickable values that never appear in a random sequence

34 Views Asked by At

A random, ordered sequence of $k$ items (e.g., integers) is generated by randomly selecting each item from a set $N$ of possible values, uniformly with replacement. Let $n = |N|$. There are at least as many possible values as there are selected items, i.e. $n \ge k$.

What is the expected number of elements in $N$ that would not appear in a given sequence?

My initial idea was to use the tail sum formula. Let $X$ be the number of elements in $N$ that do not appear in a given sequence. By the pigeonhole principle, we know that at least $n-k$ elements will never be used, and it's possible that up to $n-1$ elements will never be used (the sequence could consist of just one element repeated $k$ times). We could then compute:

$P(X \ge j)$, for $(n-k) \le j \le (n-1)$

and use an appropriate transformation with the tail sum formula:

$E(X) = \sum_{i=1}^m P(X \ge i)$

I thought that each of the $P(X \ge j)$ could be computed using a simple counting technique - divide the total number of sequences missing $j$ elements by the total number of possible sequences. The denominator would simply be $n^k$. However, I can't seem to find a closed formula for the numerator that avoids double-counting and verifies that $P(X \ge (n-k)) = 1$ and $P(X \ge n) = 0$.

Am I on the right track here?

1

There are 1 best solutions below

5
On BEST ANSWER

Remember that expectation is additive, even when the random variables you add are not independent.

So you can compute the expectation for number of values in $\{a_1\}$ that don't appear in the sequence (this number will be either $0$ or $1$, of course), and add to that the expectation of how many elements of $\{a_2\}$ that don't appear, and the expectation of how many element of $\{a_3\}$ that don't appear ...