If I ask $1000$ people to choose a random number between $0$ and $999$, what is the probability that no one will choose a specific number?

2.3k Views Asked by At

Imagine I asked $1000$ people to choose a number between $0$ and $999$ (both inclusive, the numbers are not biased, they will be completely random) and write that number down. Now, after that, pick a number, $x$, where $0\le x \le 1000$. What is the probability that none of people that I asked will have chosen that number?

I ran a simulation in Python to do this $10000$ times (code), and in $37.22\%$ of the cases, no one choose the $x$.

I would like to know a simple way to calculate this for any amount of people.

5

There are 5 best solutions below

0
On

The odds that any particular person does not pick your number is $\frac{999}{1000}$, so the odds nobody picks it will be $$\left(\frac{999}{1000}\right)^{1000} \approx 0.368$$

3
On

For $n$ people, the probability is $$\left(1-\frac1n\right)^n\approx \frac1e$$ To explain that, notice that the natural logarithm of $1+x$ is close to $x$ when $x$ is small. (Try it for $\ln(1.1),\ln(1.01),\ln(1.001)$).
So the log of the left-hand side is $n\log(1-\frac1n)\approx(n(-\frac1n))=-1$.
Since the log is near $-1$, the probability is near $1/e$.

0
On

Every solution can be represented by tuples from $$ A = \{0, \dotsc, M\}^N $$ where $M$ is the largest number to choose from and $N$ is the number of people asked, so you simply list their choices.

There are $$ \lvert A \rvert = (M+1)^N $$ possible choices.

The event you are interested in, is that all persons will not note down a certain number, we can use $0$ as that number. Then all such events would result in a tuple from $$ B = \{1, \dotsc, M\}^N $$ where there are $$ \lvert B \rvert = M^N $$ So the probability of such events is $$ p = \frac{\lvert B \rvert}{\lvert A \rvert} = \left( \frac{M}{M+1}\right)^N $$

3
On

There are 1001 numbers to choose from, so each person has a probability of 1000/1001 (just over 0.999) of not picking your chosen $x$.

Since each person chooses independently, the probability that nobody chooses $x$ is the product of the individual probabilities: $(1000/1001)^{1000}$, or more generally $(1000/1001)^N$ when $N$ people are choosing. My python interpreter is calculating this as $0.368063...$, which is a bit lower than the $0.372$ from your simulation.

1
On

I'm going to assume that the question asks for the probability that none of $N$ independent and identically distributed random variates drawn from the discrete uniform distribution $U[0,M)$ will be equal to separate random variate drawn from the same distribution.

The general case is quite easy. This is a binomial distribution, with parameter $p=\frac1M$. The probability that one given person will draw the correct number is exactly equal to $p$. The probability that that person will not draw the correct number is $1-\frac1M = \frac{M-1}M$. The probability that not a single one of the $N$ people involved in the experiment will draw the given number is $$P=\left(\frac{M-1}M\right)^N$$

In the case where $N=1000, M=1001$ (e.g. the code), this becomes $\left(\frac{1000}{1001}\right)^{1000} \approx 0.36806$. If $N=M=1000$ (the intent of the question?), this becomes $\left(\frac{999}{1000}\right)^{1000} \approx 0.36770$. Note that both are quite close to $1/e \approx 0.36788$. This is not a coincidence. Both $\left(\frac{N}{N+1}\right)^N$ and $\left(\frac{N-1}N\right)^N$ are fairly good estimates of $1/e$ for large $N$; the average of the two is an even better estimator of $1/e$.


I ran a simulation in Python to do this 10000 times, and in 37.22% of the cases, no one chose the $x$.

You are overstating the precision of your Monte Carlo simulation. It would have been much better to have said 37% rather than 37.22%. Your 10000 cases yield about two decimal places of accuracy. You would need to run that Monte Carlo simulation about one million times to get three places of accuracy, and about 100 million times to get four places of accuracy.

I'm a big fan of what I call "dumb Monte Carlo" because it's so easy to set up. That said, it is indeed "dumb" because of the logarithmic growth of precision that results. Better techniques do exist (e.g. bootstrap, jackknife, Markov chain Monte Carlo), but these are a bit trickier to set up.