We have to draw exactly N random numbers from a random uniform distribution, and want it so that, on repeated experiments, on average half of the unique numbers drawn will be smaller than M/2. The question is what should the range of numbers in the distribution that we draw the N numbers from be, such that on average half will be smaller than M/2? (note that we only count unique numbers smaller than M/2.)
Example (here N is 306 and M is 128):
suppose that we want to draw exactly 306 random numbers, from the range 1..P. We want to have on average (on repeated experiments) 64 unique random numbers from the range 1..128, and the rest to be greater than 128 (we don't care what the other numbers are.) What should the value of P be?
M = round(1/(1-(0.5)^(1/N))
for instance, a solution to the specific example using a MATLAB anonymous function:
maxhash=@(n) round(1/(1-0.5^(1/n))); M = maxhash(306); and the solution is 442.
The idea is to consider the expected number of collisions in a hash table with a uniform, random hash function and M buckets and N insertions, given by: E = N - M * (1 - ((M-1)/M)^N)
Now, we want to draw N times and ensure that after eliminating collisions, we have M/2 unique values from the range 1..M We can compute the requisite value of M by solving for M with the setting of E = M/2;