What concept should I learn in Probability to make a sense of this question?

Question

What concept should I learn in Probability to make a sense of this question?

97 Views Asked by Bumbble Comm At 05 May 2026 - 10:02

It has been years since I last studied Probability, please feel free to correct any mistakes in my approach.

~~What is the average number of distinct numbers in N random non-negative numbers less than a thousand?~~

Given integers from 0 to 999 (inclusive). Choose N integers such that all integers are equally likely to be selected. How many distinct integers do you have in N on average?

[Edit:

I changed the following question (from a textbook about algorithms) into the one I asked above.

What is the average number of distinct keys that FrequencyCounter will find among N random nonnegative integers less than 1,000, for N=10, 10^2 , 10^3 , 10^4, 10^5, and 10^6 ?

My understanding of the question is from 1...999 you select N numbers randomly, that is in no specific way. I think we can assume selecting each number is equally likely. Repetitions are allowed. Now do this for infinite times. Take a note of all results for each experiment somehow and assume all those results are equally likely. Now pick one of the results. How many distinct numbers are that in that result on average?

]

Here is my attempt:

I can choose 1000^N different lists of numbers, that is any number may repeat with any frequency in a list of size N.

If I were to choose sets (distinct numbers in the list) of size N, then the number of different sets should be P(1000, N) (permutations). However in the context of the question order of the elements should not be significant, so the correct number should be C(1000, N) (combinations).

The probability of me choosing all distinct numbers should be Px = C(1000, N)/(1000^N).

Now, how do I get to calculate an average number? Is it 1000 * Px? Somehow this doesn't sound right.

By the way, is my understanding of average number correct? I mean, I do this experiment infinite times and the number of distinct numbers in that set converges to a number which we call average? Though I am guessing that number should depend on the distribution of distinct numbers and calculated accordingly.

Original Q&A

There are 3 best solutions below

**Bumbble Comm** · Answer 1 · 2019-09-24 19:37:35

Not a complete answer, but a rephrasing of the question and some remarks related to your questions.

I think your question can be reduced to the following.

Consider the set $U$ of all sequences $$ u_1, u_2, \ldots, u_n, $$ where each $u_1$ is an integer between $1$ and $1000$. $U$ can be made into a probability space by assigning equal probability, $\frac{1}{1000^n}$ to each sequence in $U$.

For a typical sequence $u$, the number of distinct items in the sequence $u$ is some number between $1$ and $n$; let's call that $f(u)$. Then $f$ is a function from $U$ to the set $\{1, \ldots n\}$. It has an "average value", which is $$ Av(f) = \sum_{u \in U} f(u) p(u) $$ where $p(u) = \frac{1}{1000^n}$ is the probability associated to $u$. Because this is a constant, it can be pulled out of the sum to get $$ Av(f) = \frac{1}{1000^n} \sum_{u \in U} f(u), $$ which reflects your idea of "count up distinct items in each list and average them".

The thing you computed in your description is rather different. There's a different function $g : U \to \{0, 1\}$ which assigns $1$ to $u$ if all items in the sequence are distinct, and $0$ otherwise. And you've figured out that the average value for $g$ is $C(1000, n) / 1000^n$. Unfortunately, that average value doesn't tell you much of anything about the average value of $f$.

Actually, it does tell you something. For the sum $$ \sum_{u \in U} f(u) $$ over all sequences can be estimated very crudely. Let's let $S \subset U$ consist of exactly those sequences whose entries are all different. Then $$ \sum_{u \in U} f(u) \ge \sum_{u \in S} f(u) $$ And for each $u \in S$, we know what $f(u)$ is: it's just $n$. So we can say \begin{align} \sum_{u \in U} f(u) &\ge \sum_{u \in S} f(u) \\ &= \sum_{u \in S} n \\ &= n \sum_{u \in S} 1 \end{align} which is just $n$ times the number of elements in $S$, which you've already determined is $C(1000, n)$. So we know that $$ Av(f) \ge \frac{1}{1000^n} \left(n \cdot C(1000,n) \right). $$ But that is, of course, just a crude estimate. So while your approach was partly correct, and told you something it doesn't really get you where you need to go.

To actually get the answer you need, you somehow need to sum up $f(u)$ for all $u \in U$. And to do that, @lulu's idea of linearity of expectation is probably the right road to go down, but I leave that to someone else to write out. I just wanted to (1) try to phrase the question unambiguously, and (2) try to address the question of how to get from the work you'd done to the answer you needed (short answer: you really can't, but you can get a weak estimate).

**Bumbble Comm** · Answer 2 · 2019-09-24 20:04:12

https://en.wikipedia.org/wiki/Multinomial_distribution

for given $N$ you would like to compute average number of distinct integers so:

$$\sum_{\sum x_i = N} (\sum_{j=1}^{1000}1_{x_j>0})\frac{N!}{x_{1}!x_{2}!...x_{1000}!}(\frac{1}{1000})^N$$

this is the formula in general I don't know if it can be simplified

**Bumbble Comm** · Answer 3 · 2019-09-24 20:23:21

Let us define $$ X_i = \mathbb{1}\{i \text{ is picked}\}. $$ Notice that its expected value is $$ \mathbb{E} X_i = \mathbb{P}[i\ \text{is picked}] = 1 - \mathbb{P}[i\ \text{not picked}] = 1 - \Big(1 - \frac{1}{1000}\Big)^N. $$ To be clear here, we choose to look at the complementary event of $\{i\ \text{is picked}\}$ because whenever $i$ is picked it can happen in a variety of ways, maybe it was picked once, maybe a lot of times. However, on the complementary event we just have to ask that $i$ was not picked in any of the $N$ attempts.

The random variable that counts how many different numbers were picked is simply $$ \sum_{i=1}^{1000} X_i $$ and its expected value is $$ \mathbb{E} \Big[ \sum_{i=1}^{1000} X_i \Big] = 1000 \cdot \Big[1 - \Big(1 - \frac{1}{1000}\Big)^N\Big]. $$ Just to emphasize some important aspects of the problem, notice that the $X_i$ are not independent, since knowing that $i$ was picked makes it less probable that the other numbers were also picked. However, here we are just using linearity of expectation to separate into individual $\mathbb{E}[X_i]$ and then the fact that since we are choosing each number uniformly, the above expectation is equal for every value of $i$.

Finally, I checked the expected value for $N = 10, 10^2, 10^3, 10^4, 10^5$, using wolfram alpha. The corresponding values were respectively:

$$ 9.95512, 95.2079, 632.305, 999.955, 1000 $$

What concept should I learn in Probability to make a sense of this question?

There are 3 best solutions below

Related Questions in PROBABILITY

Related Questions in PROBABILITY-DISTRIBUTIONS

Trending Questions

Popular # Hahtags

Popular Questions