Beginner level understanding concept on how to derive probability of hash collision

Question

Beginner level understanding concept on how to derive probability of hash collision

581 Views Asked by Bumbble Comm At 11 May 2026 - 9:33

A hash function indexes all items in hash tables and searches for near items via hash table lookup. The hash

table is a data structure that is composed of buckets, each of which is indexed by a hash code.

The hash table is defined the function family $G = \{g:S \rightarrow U^k\}$ such that $g(p) = (h_1(p),\ldots,h_k(p))$ , where $h_i ∈ H$ to obtain a $k$ bit hash code. The query point $q$ is hashed into all the hash table $\{g_1(p),\ldots,g_l(p)\}$. The candidate set, $\{p_1,p_2,\ldots,p_m\}$, is composed of the points in all the hash tables which are hashed into the same bucket with the query point $q$. Two hash codes for two different messages can collide if they have the same hash code. This is often called the Birthday paradox.

The properties of a good hash function is that there should be no hash collision.

How can I calculate the probability of hash collision?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

I'll start with the answer: the number of keys falling into a given bucket quite accurately follows a Poisson distribution:

$$P(n) = e^{-\lambda}\frac{\lambda^n}{n!}$$

Here, $\lambda$ is the average number of keys per bucket, which is equal to the number of keys $K$ divided by the number of buckets $B$. The total number of keys and buckets is approximately irrelevant, so long as they are larger than about $10$; only the ratio of those two numbers matters.

An easy way to derive this formula is to start from the formula for the number of heads in $K$ throws of a biased coin:

$$P(n\text{ heads}) = \frac{K!}{n!(K-n)!}h^n(1-h)^{K-n}$$

In this formula, $h$ is the probability that a given throw will come out heads.

The hash collision problem for a single bucket can be viewed as a coin where "heads" means the hash landed in the bucket and "tails" means it landed somewhere else. Therefore, we take $h=\lambda/K$:

$$P(n\text{ heads}) = \frac{K!}{n!(K-n)!}\left(\frac{\lambda}{K}\right)^n\left(1-\frac{\lambda}{K}\right)^{K-n}$$

Now let's suppose $K$ is very large. We make three approximations:

$K!/(K-n)!$ is roughly $K^n$.
$(1-\lambda/K)^{-n}$ is roughly $1$.
$(1-\lambda/K)^K$ is roughly $e^{-\lambda}$. This is a good formula to know; I use it again lower down.

Cancel the two factors of $K^n$ and rearrange a bit and you get the Poisson distribution.

But I think you are asking for the probability of two or more keys landing in any bucket, whereas I have so far given a formula for the number landing in a single bucket. The interesting case is when $\lambda$ is very small, because otherwise we're almost guaranteed to have a collision somewhere in the table. Let's look at the first few values of $P(n)$:

$P(0)$ is $e^{-\lambda}$, which is close to (a bit smaller than) $1$.
$P(1)$ is $e^{-\lambda}\lambda$, which is close to $\lambda$.
$P(2)$ is $e^{-\lambda}(\lambda^2/2)$, which is close to $\lambda^2/2$.
$P(3)$ is smaller by a further factor of $\lambda/3$, which is very small, and subsequent terms are even smaller. These cases basically don't happen, so let's ignore them.

$n=0$ and $n=1$ are not collisions, so the probability of a collision in a single bucket is about $P(2)$, which is about $\lambda^2/2$. Therefore, the probability of no collision in any bucket is about $P(\text{good}) = (1-\lambda^2/2)^B$. Here I have also made the approximation that the buckets are independent, which is good for large $B$.

We would prefer a formula involving $K$ and $B$, so let's substitute $\lambda=K/B$. This gives $P(\text{good}) = (1-K^2/2B^2)^B = (1-((K^2/2B)/B))^B$. For large $B$, this approaches $e^{-K^2/2B}$.

This gives you the birthday result: the chance of a collision somewhere in the table becomes significant when $K^2$ approaches $2B$.

Beginner level understanding concept on how to derive probability of hash collision

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in PROOF-WRITING

Related Questions in HASH-FUNCTION

Trending Questions

Popular # Hahtags

Popular Questions