Calculating probability of no hash collision

778 Views Asked by At

Given a 64-bit hash function that takes arbitrary inputs, what is the probability that feeding 10 million inputs into the hash function will outputs 10 million unique outputs. I've came up with this: $$ \begin{aligned} P(10,000,000)&=1\times\left(\frac{2^{64}-1}{2^{64}}\right)\times\left(\frac{2^{64}-2}{2^{64}}\right)\times\cdots\times\left(\frac{2^{64}-9,999,999}{2^{64}}\right)\\ &=\frac{2^{64}!}{(2^{64})^{10,000,000}(2^{64}-10,000,000)!} \end{aligned} $$ However the numbers are too big for a standard calculator to handle, what are the better ways to calculate this for a arbitrary $n$ other than 10 million? Please explain how they work.

1

There are 1 best solutions below

0
On BEST ANSWER

Wikipedia's birthday problem article gives various approximations for the probability of no collisions (better at the top, simpler at the bottom)

  • $\exp\left(-\frac{n(n-1)}{2d}\right)$
  • $\exp\left(-\frac{n^2}{2d}\right)$
  • $1-\frac{n^2}{2d}$

Here $n=10^7$ and $d = 2^{64}$ and all of these are about $0.9999972895$

By comparison, the more familiar case of $n=23$ and $d=365$ would give approximations of $0.5000,$ $0.4845$ and $0.2753$ when the true value should be about $0.4927$