Approximating an inequality with extremely large numbers

64 Views Asked by At

I am trying to work out the number of 12 item sequences needed to have a greater than 50% probability that two of these are the same. So far, I have got $$ 1 - \frac{2048^{12}!}{\left(2048^{12}-n\right)! \times 2048^{12n}} \ge 0.5, $$ where $2048$ is the number of items to choose from. Whats the technique to approximate $n$ with such large numbers?

2

There are 2 best solutions below

0
On

You divide them, carefully. Let $w = 2048^{12} = 2^{132}$ be the total number of seed phrases. If we choose $n$ of them randomly, the probability that we do not get a collision is, as you've calculated,

$$\frac{w!}{(w - n)! w^n} = \prod_{i=1}^{n-1} \left( 1 - \frac{i}{w} \right).$$

A simple and useful approximation, mentioned on the Wikipedia article on the birthday paradox, comes from using $1 - x \le e^{-x}$ (this is a good approximation if $x$ is small), which gives

$$\prod_{i=1}^{n-1} \left( 1 - \frac{i}{w} \right) \le \exp \left( - \sum_{i=1}^{n-1} \frac{i}{w} \right) = \boxed{ \exp \left( - \frac{ {n \choose 2} }{w} \right) }.$$

The smallest value of $n$ for which this bound is less than $\frac{1}{2}$ is given by solving ${n \choose 2} = w \log 2$, which gives approximately $n \approx \sqrt{2w \log 2}$ in general, and in this case gives $n \approx (\sqrt{2 \log 2}) 2^{66} \approx \boxed{ 8.7 \times 10^{19} }$.

To get a sense of how accurate this approximation is, taking logarithms gives

$$\log \prod_{i=1}^{n-1} \left( 1 - \frac{i}{w} \right) = \sum_{i=1}^{n-1} \log \left( 1 - \frac{i}{w} \right).$$

For small values of $x$ we have $\log (1 - x) \approx -x$ (this is the logarithm of our previous approximation $1 - x \approx e^{-x}$; more precisely for $x \in (0, \frac{1}{2}]$ we have $-x - x^2 \le \log (1 - x) \le -x$ which applied to the above sum gives

$$- \frac{ {n \choose 2} }{w} - \sum_{i=1}^{n-1} \frac{i^2}{w^2} \le \sum_{i=1}^{n-1} \log \left( 1 - \frac{i}{w} \right) \le - \frac{ {n \choose 2} }{w}$$

so the error in the above approximation, in the exponent, grows like $\frac{n^3}{w^2}$. Since $n \approx \sqrt{w}$ above this grows like $\frac{1}{\sqrt{w}}$ which is tiny. So our approximation above is very close.

0
On

Using the same notations as @Qiaochu Yuan in his answer, we look for $n$ such that $$2( w!)\leq (w-n)!\,\, w^n$$

Taking logarithms, using Stirling approximation for large values of $w$ and continuing with Taylor expansions, we end with $$0\leq -\log (2)+\frac{n^2-n}{2 w}+O\left(\frac{1}{w^3}\right)\lt -\log(2)+\frac{n^2}{2 w} \implies n > \sqrt{2 w \log (2)}$$ as already given by @Qiaochu Yuan.