Expected length of conditional random sequence of n integers

427 Views Asked by At

Assume we generate a random number sequence of length $m$ from a set $\{1, 2, 3,…,n\}$of $n$ integers. Each integer has probability $1/n$ of being selected. The set of all possible sequences of length $m$ contains exactly $n^m$ different sequences.

We now define a sequence as complete if each of the $n$ integers appear at least once and incomplete if at least one of the $n$ integers does not appear.

Obviously the set of all possible sequences of length $m$ is the union of the complete and incomplete sets of sequences so $a+b= n^m$, where $a$ is the total number of complete sequences of length $m$ and $b$ is the total number of incomplete sequences of length $m$.

We can generate any possible incomplete sequence of length m as follows: From the set of integers we exclude a specific integer, say $3$, thus leaving only $n-1$ selectable integers. The number of ways we can generate a specific sequence from the remaining $n-1$ integers is $(n-1)^m$. Since we can choose to omit any other integer than $3$, the total number of sequences with any one of the $n$ integers excluded is $n(n-1)^m$ and all such sequences will be incomplete.
Hence $b = n(n-1)^m$ and it follows from $a = n^m - b$ that the total possible number of complete sequences is $a = n^m - n(n-1)^m$. From this we deduce that the probability $P(m,n)$ of obtaining a sequence of length $m$ that is complete is simply $P(m,n) = 1-n(1-1/n)^m$. Obviously this probability must be zero for any $m$ less than $n$.

With this in mind the question is: What is the average length of $m$ (expected value) and its variance pertaining to getting a complete sequence of any length in the range $n$ to infinity?

Reason: I can not be sure of the distribution function of $m$.

2

There are 2 best solutions below

3
On

There's something off about your counting here: we don't actually have $n(n-1)^m$ incomplete sequences as you're double counting those that are missing $2$ numbers, and so on. To count this number explicitly, you'd have to use the principle of inclusion exclusion.

I'm not entirely sure if this is what you're asking, but it seems like you want to draw numbers until you get a complete sequence, and let $m$ be the number that was drawn. This is referred to as the Coupon Collector's problem, whose mean and variable can be calculated explicitly; see the Wikipedia link for details.

0
On

After quite a lot of work, I think I finally got the counting right. I read the Wikipedia paper on the Coupon Collectors problem, but it only talks about the statistics involved but not how the probabilities are derived, nor the number of possible complete and incomplete sequences. Out of interest and for my own satisfaction I just wrote the following answer using MS Word. Since I am not a Math Jax user I have resigned to just show screenshots from the document. I might add, that the first table in the last slide has been calculated using equation 6) and verified by having my computer write down all possible sequences (up to n = 6) and have it actually count the occurrences of complete sequences.

enter image description here enter image description here enter image description here

enter image description here

enter image description here

enter image description here

enter image description here