I am from ukraine and I start taking course on the computer programming. My classmate tells me about this website for interesting discussion and information about mathematics.
Today we learn about problem where there is 100 unique numbers. We pick a number, record this number and put this number back ... then repeat this process 100 times (i think it is called sampling with replacement).
We are interested in knowing on the average, for this experiment: how many unique numbers we will get, how many non-unique numbers we will get and how many numbers will be unselected.
Here is the python code we were given for this experiment - this experiment is now repeated 500 times :
import numpy as np
# Set the number of numbers, the number of samples, and the number of repetitions
n = 100
k = 100
reps = 500
# Initialize arrays to store the results
n_unique_arr = np.zeros(reps)
n_non_unique_arr = np.zeros(reps)
n_unselected_arr = np.zeros(reps)
# Repeat the sampling process reps times
for i in range(reps):
# Generate k random samples with replacement from the sequence 1:n
samples = np.random.choice(range(1, n+1), k, replace=True)
# Calculate the number of unique numbers in the sample
n_unique = len(np.unique(samples))
# Calculate the number of non-unique numbers in the sample
n_non_unique = k - n_unique
# Calculate the number of unselected numbers
n_unselected = n - n_unique
# Store the results
n_unique_arr[i] = n_unique
n_non_unique_arr[i] = n_non_unique
n_unselected_arr[i] = n_unselected
# Calculate the average number of unique, non-unique, and unselected numbers
avg_n_unique = np.mean(n_unique_arr)
avg_n_non_unique = np.mean(n_non_unique_arr)
avg_n_unselected = np.mean(n_unselected_arr)
# Print the results
print("Average number of unique numbers:", avg_n_unique)
print("Average number of non-unique numbers:", avg_n_non_unique)
print("Average number of unselected numbers:", avg_n_unselected)
Average number of unique numbers: 63.37
Average number of non-unique numbers: 36.63
Average number of unselected numbers: 36.63
In general, is there some mathematics formula to understand where this numbers come from? How I can understand how to calculate these numbers in general without simulation results? Maybe there is some mathematics formula to determine this?
Suppose we pick $n$ times uniformly randomly from among $n$ numbers (in our particular case, $n = 100$). The of times a given number is picked follows a binomial distribution, $\operatorname{B}(n, p)$, where in our case $p = \frac{1}{n}$: Explicitly, the probability that a given number is picked exactly $k$ times $$\Bbb P(k) = {n \choose k} \left(\frac{1}{n}\right)^k \left(1 - \frac{1}{n}\right)^{n - k} = {n \choose k} \frac{(n - 1)^{n - k}}{n^{n}} .$$ For $k = 0$, we have $$\Bbb P(0) = \left(1 - \frac{1}{n}\right)^n ,$$ and for $n = 100$, $$\Bbb P(0) = \left(1 - \frac{1}{100}\right)^{100}\approx 0.366032\ldots .$$ So, we expect that $100 \Bbb P(0) = \boxed{36.6032\ldots}$ of the $100$ numbers are not picked, in close agreement with your numerical simulation. Notice that for large $n$, $$\Bbb P(0) \approx \lim_{n \to \infty} \left(1 - \frac{1}{n}\right)^n = \frac{1}{e} = 0.367879\ldots,$$ which for $n = 100$ carries a relative error of about $1$ part in $200$. Asymptotically, $\left(1 - \frac{1}{n}\right)^n - \frac{1}{e} \in O\left(\frac{1}{n}\right)$.
Correspondingly, we expect that any given number has a probability $\Bbb P(k > 0) = 1 - \Bbb P(0) = 1 - \left(1 - \frac{1}{n}\right)^{n}$ of being picked at least once. For $n = 100$ this is $$\Bbb P(k > 0) = 0.633967\ldots,$$ so we expect that about $\boxed{63.3967\ldots}$ of the $100$ numbers are picked at least once. Again for large $n$, the probability that a given number is picked at least once is $\Bbb P(k > 0) \approx 1 - \frac{1}{e}$.
The probability that a chosen number is picked exactly once is $$\Bbb P(1) = \left(1 - \frac{1}{n}\right)^{n - 1}.$$ For $n = 100$, we have $$\Bbb P(1) = 0.369729\ldots ,$$ just slightly greater (by a factor of $\frac{100}{99}$) than the probability that a chosen number is not picked at all, hence we expect that $\boxed{36.9729\ldots}$ of the $100$ numbers will be picked once, again in close agreement with the simulation. Again for large $n$ we can approximate this probability by $$\lim_{n \to \infty} \left(1 - \frac{1}{100}\right)^{99} \approx \left(1 - \frac{1}{100}\right)^{100} \approx \frac{1}{e} .$$
When $n$ is large and $p$ is small, and the binomial distribution is well-approximated by the Poisson distribution $\operatorname{Pois}(np)$ of mean $\lambda := n p$, which is somewhat less cumbersome than the binomial distribution. This approximation yields $$\Bbb P(k) \approx \frac{e^{-\lambda} \lambda^k}{k!},$$ and for our case, wherein $\lambda = np = n \left(\frac{1}{n}\right) = 1$, this approximation simplifies to $$\Bbb P(k) \approx \frac{1}{k! e} .$$ Asymptotically, $\Bbb P(k) - \frac{1}{k! e} \in O\left(\frac{1}{n}\right)$. In particular, we see that $\Bbb P(k)$ depends only weakly on $n$ once $n$ is large.
So, for example, a given number has a probability $\Bbb P(2) \approx \frac{1}{2e} = 0.183939\ldots$ of being chosen exactly $2$ times, so we expect that $18.3939\ldots$ of the $100$ numbers to be chosen exactly $2$ times.