A hash function indexes all items in hash tables and searches for near items via hash table lookup. The hash
table is a data structure that is composed of buckets, each of which is indexed by a hash code.
The hash table is defined the function family $G = \{g:S \rightarrow U^k\}$ such that $g(p) = (h_1(p),\ldots,h_k(p))$ , where $h_i ∈ H$ to obtain a $k$ bit hash code. The query point $q$ is hashed into all the hash table $\{g_1(p),\ldots,g_l(p)\}$. The candidate set, $\{p_1,p_2,\ldots,p_m\}$, is composed of the points in all the hash tables which are hashed into the same bucket with the query point $q$. Two hash codes for two different messages can collide if they have the same hash code. This is often called the Birthday paradox.
The properties of a good hash function is that there should be no hash collision.
How can I calculate the probability of hash collision?
I'll start with the answer: the number of keys falling into a given bucket quite accurately follows a Poisson distribution:
$$P(n) = e^{-\lambda}\frac{\lambda^n}{n!}$$
Here, $\lambda$ is the average number of keys per bucket, which is equal to the number of keys $K$ divided by the number of buckets $B$. The total number of keys and buckets is approximately irrelevant, so long as they are larger than about $10$; only the ratio of those two numbers matters.
An easy way to derive this formula is to start from the formula for the number of heads in $K$ throws of a biased coin:
$$P(n\text{ heads}) = \frac{K!}{n!(K-n)!}h^n(1-h)^{K-n}$$
In this formula, $h$ is the probability that a given throw will come out heads.
The hash collision problem for a single bucket can be viewed as a coin where "heads" means the hash landed in the bucket and "tails" means it landed somewhere else. Therefore, we take $h=\lambda/K$:
$$P(n\text{ heads}) = \frac{K!}{n!(K-n)!}\left(\frac{\lambda}{K}\right)^n\left(1-\frac{\lambda}{K}\right)^{K-n}$$
Now let's suppose $K$ is very large. We make three approximations:
Cancel the two factors of $K^n$ and rearrange a bit and you get the Poisson distribution.
But I think you are asking for the probability of two or more keys landing in any bucket, whereas I have so far given a formula for the number landing in a single bucket. The interesting case is when $\lambda$ is very small, because otherwise we're almost guaranteed to have a collision somewhere in the table. Let's look at the first few values of $P(n)$:
$n=0$ and $n=1$ are not collisions, so the probability of a collision in a single bucket is about $P(2)$, which is about $\lambda^2/2$. Therefore, the probability of no collision in any bucket is about $P(\text{good}) = (1-\lambda^2/2)^B$. Here I have also made the approximation that the buckets are independent, which is good for large $B$.
We would prefer a formula involving $K$ and $B$, so let's substitute $\lambda=K/B$. This gives $P(\text{good}) = (1-K^2/2B^2)^B = (1-((K^2/2B)/B))^B$. For large $B$, this approaches $e^{-K^2/2B}$.
This gives you the birthday result: the chance of a collision somewhere in the table becomes significant when $K^2$ approaches $2B$.