example 1.4 chapter 1 from mining of massive data sets book

303 Views Asked by At

I am reading the book mining of massive data sets. (http://mmds.org/) In its chapter 1 http://infolab.stanford.edu/~ullman/mmds/ch1.pdf following section is there on page 9.

Example 1.4: Suppose hash-keys are positive integers. A common and simple hash function is to pick h(x) = x mod B, that is, the remainder when x is divided by B. That choice works fine if our population of hash-keys is all positive integers. 1/Bth of the integers will be assigned to each of the buckets. However, suppose our population is the even integers, and B = 10. Then only buckets 0, 2, 4, 6, and 8 can be the value of h(x), and the hash function is distinctly nonrandom in its behavior. On the other hand, if we picked B = 11, then we would find that 1/11th of the even integers get sent to each of the 11 buckets, so the hash function would work very well.

I could not understand this statement

hen only buckets 0, 2, 4, 6, and 8 can

be the value of h(x), and the hash function is distinctly nonrandom in its behavior.

What is the example in this trying to say? The hash in my understanding is made by h(x)= x mod B , so if B=10 then x mod 10 will have all values from 0,1,2,3,4,5,6,7,8,9.

1

There are 1 best solutions below

5
On BEST ANSWER

This is a sentence a few lines above the particular example

To be precise, if hash-keys are drawn randomly from a reasonable population of possible hash-keys, then h will send approximately equal numbers of hash-keys to each of the $B$ buckets.

This property is not satisfied, in particular, odd-numbered bins would not have any entry if $B=10$.

However, if $B$ is $11$, then we have $h(2)=2, h(4)=4, h(6)=6, h(8)=8, h(10)=10, h(12)=1$ and so on. Note that even the odd-bins would be filled uniformly. This phenomenon is due to $2$ and $11$ are coprime.