Why do we pick ax + b when doing universal hashing?

1.1k Views Asked by At

Why can't we pick just $x + b$ for example? I know that for universal hashing the Pr[$h(x) == h(y)$] $\leq 1/|n|$

But with the same hash function, just without the $a$ we can get $h(k) = (($k$ + b$ mod $p$) mod $n$).

I've run through a million examples and I can't find a single reason why I might be wrong.

2

There are 2 best solutions below

1
On BEST ANSWER

You pick $p\ge n$ and your family consists of $h_b(x)=(x+b\bmod p)\bmod n$, where $b$ ranges over $0,\ldots, b-1$. To show that this family is universal, we need that for any $0\le x<y<n$, the probability that $h_b(x)=h_b(y)$ for a uniform choice of $b$, is $\le \frac 1n$. Note that if $p=2k+1>n+1$ and $n>k+1$, then $h_k(k+1)=0$ and $h_k(n-k)=0$. Also, $h_{k+1}(k+1)=1$ and $h_{k+1}(n-k)=1$. Thus for $x=k+1\ne n-k=y$, the probability of $h_b(x)=h_b(y)$ is at least $\frac 2{p}$. Unfortunately, this may well be $>\frac 1n$.

You'd be better of to consider $h_b(x)=x+b\bmod n$. But that would not really be hash-y after all

0
On

For a class of hash-functions to be universal, we require that for any two keys $x$ and $y$, the probability of collision is within $1/m$. That means, atmost $1/m$ of the functions in that class can give collision for the two keys.

Now suppose the class is designed using $h_{a,b} = ((ax + b)\bmod p)\bmod m$ (this is one of the standard method). But if we fix $a=1$, then consider two keys for example: $x = X$ and $y = X+m$, such that $X+m < p$. Their hashes will be:

$((X+b)\bmod p)\bmod m, ((X+m+b)\bmod p)\bmod m$

But the second value actually equals the first, for all values of $b$. Thus, without having $a$ as a variable in this modified class, we have both these keys colliding for all functions in the class. So this class can never be a universal-class.

That is why having variable $a$ is useful in this design.

It will also be useful to go through the proof of why such class is universal from some book (e.g. Cormen et al's "Introduction to Algorithms" (chapter "Hash Tables")).