Hashing function probability of collision, is this solution wrong?

337 Views Asked by At

We hash n keys into k=1000 memory locations one by one. What is the probability that the first i records do not produce a collision. Assume each key is independently and uniformly hashed into the memory locations.

The solution is given as follows: 1000/1000 * 999/1000 * ... * (1000-(i-1))/1000

I get the logic behind that solution but it doesn't seem correct. The answer should depend on n as well. For example, if there were n=1001 keys then there would be a collision with probability 1 somewhere in the list of 1000 memory locations, perhaps not in the first i records though. But as n grows larger and larger, the probability that there is a collision at any particular record (and in the first i records) increases and could be made arbitrarily close to 1 with a large enough choice of n. So having an answer independent of n makes no sense unless the problem was not worded correctly. In which case, the given solution might make more sense. But the problem as stated seem much more difficult to solve than just the simple solution given. (Which makes me think that this was not the intent of the question. But the wording is misleading.)

I am assuming that all n keys will be hashed into memory locations before analyzing the probability. Otherwise, what's the point of mentioning how many keys there are? If they are just talking about hashing the first i keys and then stopping then what's the point of saying that there are n keys being hashed?

1

There are 1 best solutions below

3
On

There isn't really a point to mentioning the total number of keys that will be hashed if we're only going to analyze the first $i$ keys. Indeed, they never tells us what $n$ is. So it makes sense that the given solution is independent of $n$ but dependent on $i$ and $k$.

Note that for the case where $i = 1001$ (which is greater than $k = 1000$), the probability that the first $1001$ records do not produce a collision is: $$ \frac{1000}{1000} \cdot \frac{999}{1000} \cdot \cdots \cdot \frac{2}{1000} \cdot \frac{1}{1000} \cdot \frac{1000 - (1001 - 1)}{1000} = 0 $$ which agrees with our intuition.