Picking $M^i$ random numbers. How many steps $i$ needed until at least $1$ out $S$ numbers have been picked? (mean value)

24 Views Asked by At

Given a vector $v_0 = (a,b,c)$ containing 3 random numbers $a,b,c \in [1..N]$ with $N$ a very big number.

Starting with $v_0$ an amount of $M$ new vectors are generated for each dimension. Those share one number with $v_0$ (and not more). The other two dimensions are different random numbers.

For $v_0$ that would be: $$v_k = (a,b_k,c_k) \text{ }\text{ with } k = 1..M \text{ }\text{ and } b_k,c_k \in [1..N] \text{ }\text{ but } b_k\ne b, c_k\ne c \text{ }\text{ and } b_{k=i}\ne b_{k=j}, c_{k=i}\ne c_{k=j} \text{ }\text{ except if } i=j$$ $$v_l = (a_l,b,c_l) \text{ }\text{ with } l = 1..M \text{ }\text{ and } a_l,c_l \in [1..N]\text{ but } a_l\ne a, c_l\ne c \text{ }\text{ and } a_{l=i}\ne a_{l=j}, c_{l=i}\ne c_{l=j} \text{ }\text{ except if } i=j$$ $$v_m = (a_m,b_m,c) \text{ }\text{ with } m = 1..M \text{ }\text{ and } a_m,b_m \in [1..N]\text{ but } a_m\ne a, b_m\ne b \text{ }\text{ and } a_{m=i}\ne a_{m=j}, b_{m=i}\ne b_{m=j} \text{ }\text{ except if } i=j$$ For every next step only $2 M$ new vectors are generated out each vector generated in the previous step. Those new vector need to share one of those random number which was generated in previous step. So e.g. for any $v_k = (a, b_k, c_k)$ it will generate $M$ new vectors with equal $b_k$ and $M$ with equal $c_k$. As before they only share this single number with their related base $v_k$ (also with each other).


If a newly generated vector is equal to any of those vectors generated before it will get ignored (or to be exact it will generate the same pseudo random numbers as before).


Question

How many such steps are needed until at least one of those vectors is equal to any vector of a given set with size S?

Due to random numbers it can range from $0$ to infinity. I'm interested in the mean value of it.

$$ M << S << N$$


Solving Trial

1.) Simplification

  • If we combine two steps for a given vector with each other it will generate $4 M^2$ new vector out of it. Those are independent from the given vector (they don't share any dimension variables). One exception is the first and 2nd step. There we have $6 M^2$ vectors which are independent from $v_0$ so basically random vectors by themselves.
  • Done this above we can generalize it to random numbers of max size $P = N^3$

2.) Approximation

  • Due to exponential growth we can ignore the impact of vectors/numbers already picked before. A result $\pm 2$ would be enough.

3.) Trial

3.1) Number of elements

Given a random number $n_0 \in [1..P]$ it would generate $6 M^2$ new random numbers in the first step. In second step and every next step $4 M^2$ get picked. So at step $i$ it will be:

$$n_0 = 1$$ $$n_i = 6 M^2 \cdot (4 M^2)^{i-1} = \frac{3}{2}(2 M)^{2i}$$

The sum of those would be:

$$s_i = \sum_{j=0}^{i} \frac{3}{2}(2 M)^{2j} = \frac{3}{2} \frac{ (2 M)^{2 i+1} -1}{2 M -1}$$


3.2) Chance one out of $S$ is picked

If it should be one out of $S$ given numbers the chance of having any would be:

$$p({\text{hit}}) = \frac{S}{P}$$

The chance of not having any after $i$ steps should be:

$$p(\text{none after $i$ steps}) = (1-\frac{S}{P})^{s_i}$$

3.3) Further simplification

If instead of steps we just draw single numbers until we got one from $S$. The mean should be

$$E( n) = \frac{P}{S}$$

To find the mean number of steps needed we could compute the first $i$ with: $$s_i > E(n)$$ $$\frac{3}{2} \frac{ (2 M)^{2j+1} -1}{2 M -1} > \frac{P}{S}$$ $$ (2 M)^{2 i+1} > \frac{2P(2 M -1)}{3S}+1 $$ $$ i > \frac{1}{2}(\log_{2M}(\frac{2P(2 M -1)}{3S}+1)-1) $$


Result?

The result of 3.3 need to be doubled to receive a solution for the initial problem. Does this qualify as valid result or is there some better/correct way to compute the mean value of steps?

Possible problems/bonus questions:

  • impact of picking already picked numbers too big to be ignored?
  • some other simplification doesn't work?
  • in actual problem $M$ is not a constant. The probability decreases exponentially with size. The min value is about 1000 times smaller but it can als be over 20 times bigger. The mean value is known (here represented as $M$). Due to the high amount of random values would it make any big impact?