Lower bound on the probabilty that two random elements collide under function evaluation

44 Views Asked by At

Consider the following statement

Let $f \colon M \to N $, where $M$ and $N$ are finite sets. Then $$\Pr_{m,m' \in M}[f(m) = f(m')] \geq 1 / |N|.$$

Originally, I thought this should be obvious, and that I should be able to "see" directly that this is true. However, I have only been able to prove it as follows (note that this proof assumes that $m$ and $m'$ are chosen uniformly from $M$, but this restriction can be lifted by following the suggestion in @antkam's comment below).

Let $M_y = \{ m \in M \mid f(m) = y \}$. Since, $m$ and $m'$ are chosen independently and uniformly from $M$, we have: $$\Pr_{m,m' \in M}[f(m) = f(m')] = \sum_{y \in N}\Pr[m \in M_y \land m' \in M_y] = \sum_{y \in N} |M_y|^2 / |M|^2$$ By Cauchy–Schwarz, we have: $$\sum_{y \in N} |M_y|^2 \cdot \sum_{y \in N} 1^2 = \sum_{y \in N} |M_y|^2 \cdot |N| \geq \left( \sum_{y \in N} |M_y| \cdot 1 \right)^2 = |M|^2, $$ and the statement follows.

However, this proof gives me no insight into why the statement is true. Thus, I would like some intuition and (hopefully) a proof that explains why this is the case.

1

There are 1 best solutions below

1
On BEST ANSWER

Your proof is legitimate and quite slick; why argue with perfection?

If you want intuition into the result, consider the task of making a collision as rare as possible. Assume that $f$ maps the finite set $M$ into the finite set $N$. (Note that the nature of the target space $\{0,1\}^n$ is irrelevant; what matters is the number of elements $|N|$ in the target space.) Intuitively a collision is least likely in the case $|M|=|N|$ because you can create a function $f$ that is bijective between $M$ and $N$. And if $|M|<|N|$, it isn't even possible for $f$ to take on $|N|$ values. These last two statements can be made rigorous: If $|M|\le |N|$, then $$P(f(m)=f(m'))\ge P(m=m')=\frac1{|M|}\ge\frac1{|N|}.$$ In the case $|M|>|N|$ there is guaranteed to be at least one pair $m\ne m'$ where $f(m)=f(m')$, so your probability of collision can "only" be higher than the case $|M|=|N|$. Exactly how "only" can be made rigorous is where your proof comes in.


EDIT: Here's an alternative proof that's not much different from yours, but may provide more insight. The probability of collision is $P(f(m)=f(m'))=\sum_{y\in N} P(f(m)=y)^2$. This last is the sum of squares of $|N|$ numbers that add up to $1$. If an appeal to Cauchy-Schwarz seems unsatisfactory, you can prove the result

If $p_1,p_2,\ldots,p_n$ sum to $1$, then $\sum p_i^2\ge\frac1n$.

more intuitively by writing

$\sum p_i^2=\sum[(p_i-\frac1n) +\frac1n]^2=\sum(p_i-\frac1n)^2 + \frac1n$,

since the cross term vanishes. Written this way, it's clear that the only way to achieve the minimum value $1/n$ is if each $p_i$ is equal to $1/n$. Translated back to the context of $P(f(m)=y)$, the probability of collision achieves its minimum value $1/|N|$ when and only when $f(m)$ has a uniform distribution over the values in $N$. In the special case where $m$ is chosen uniformly from a finite set $M$, the minimum $1/|N|$ cannot be achieved unless $|M|$ is a multiple of $|N|$.