Expected value of collisions

245 Views Asked by At

There are two definitions that are needed in the following problem:

A probability distribution $H$ over a hash function $h : U \rightarrow \{1, 2, \ldots, m\}$ is said to be universal if for all $x\neq y$, $\sup_{h\in H} P(h(x) = P(h(y)) \leq 1/m$

If $H$ is universal and is a uniform distribution over a set of functions $\{h_1, h_2, \ldots\}$, then it is called a universal hash family


(a) Suppose $H$ is a universal hash family over a set of functions $\{h_1, h_2, \ldots\}$ (there is some abuse of notation going on here -- we say $H$ to mean both the set and the uniform distribution over it) from $U$ to $\{1, 2, \ldots, m\}$ and let $S\subseteq U$ be a set of $m$ elements we want to hash (the number of elements being hashed is equal to the size of the hash table). Now further, suppose that $h$ is drawn uniformly at random from $H$. Show that the expected number of collisions is at most $(m - 1)/2$.

(b) Continuing from (a), now show that the probability that no bin in the table gets more than $2\sqrt{m} + 1$ elements is at least $3/4$ (Hint: Use the previous part and consider Markov Inequality).


I was wondering if someone can please assist me with this problem. The first result is sort of intuitive to me but not really sure about the second one.

Let $X$ be the number of collisions (random variable). Then how can we bound $E(X)$?