With the knowledge that the probability of a hash collision is (see: Hash Collision Probabilities):
1 - e^((-k * (k - 1)) / (2 * n))
Where k is the number of input values and n is the number of possible hash values. I want to find the solution for n, given the probability of collisions is z:
1 - e^((-k * (k - 1)) / (2 * n)) - z = 0
Utilizing Mathematica, I entered:
Reduce[1 - e^((-k*(k-1))/(2*n)) - z == 0, n]
Output:
C[1] \[Element] Integers
&& -1 + z != 0
&& ((n != 0 && (k == 0 || k == 1) && z == 0)
|| (2 I \[Pi] C[1] + Log[1/(1 - z)] != 0 && (-1 + k) k != 0
&& n == ((-1 + k) k)/(2 (2 I \[Pi] C[1] + Log[1/(1 - z)]))))
Now, I'm not sure what value to use for C[1] and the meaning of 'I'. I need to translate the equation into C code.
If I instead use Solve and limit the domain to Reals:
I get:
$$n = \frac{-k + k^2}{2log(\frac{1}{1 - z})}$$
Where (0 < k < 1 && 0 < z < 1) || (0 < k < 1 && z < 0) || (k > 1 && 0 < z < 1) || (k > 1 && z < 0) || (k < 0 && 0 < z < 1) || (k < 0 && z < 0)