Probability of Collision [binary vector]

272 Views Asked by At

Given a collection of nonzero distinct binary vectors $x_1,..., x_m \in {0,1}^n$. To make look up faster, we hash them to vectors of length $p<n$ by means of a lienar mapping $x_i \rightarrow Ax_i$ where $A$ is a $p\times n$ matrix with 0-1 entries, and all computations are performed modulo 2.

Suppose the entries of the matrix are picked uniformly at random (each independent coin toss)...

1- If we were to pick any $1 \leq i < j \leq m $ what is the prob. that $x_i$ and $x_j$ hash to the same vector and we have a collision?

My attempt: I think that this is similar to the birthday problem from https://preshing.com/20110504/hash-collision-probabilities/ but I'm really unsure that we are bringing in an exponential 'e'. I'd guess from that link that the probability of collision = 1 - prob(no collision). So P(collision )= $1- e^{-p*(p-1)/2m}$ =$1- e^{-2*(2-1)/2m}$ = $1- e^{-1/m}$. Is that correct or completely the wrong? ... or would you leave the answer in terms of p (I think we only pick one i and j so we can simplify here?

2- Prove that if $p \geq 2log_2 m$, then with probability atleast $\frac{1}{2}$ there are no collisions among the $x_i$.

My attempt: I'd say I need the answer to above (which is why I was concerned) .... but we would need $\frac{1}{2} =$ P(No Collision) =$e^{-p*(p-1)/2m}$ ... but this is not simplifying to $log_2$ and instead would mean $2mln(\frac{1}{2})= -p^{2}-p$. which is wrong b/c I will have that ln follow around ...

2

There are 2 best solutions below

2
On BEST ANSWER

I think you should have 2^p in place of p in your formula then you get the correct answer Since you are mapping into {0,1}^p each x_i gets sent to a random entry of {0,1}^p uniformly at random so it's the birthday problem with 2^p birthdays and m persons so by your estimate $\frac{1}{2} =$ P(No Collision) =$e^{-(2^p)*(2^p-1)/2m}$ $2mln(\frac{1}{2})= -(2^p)^{2}+(2^p) $. which is closer I think it's best if you use the theorem that you get probability about 0.5 if you have that the square root of the number of possible birthdays is equal to the number of people (once both are large enough)

1
On

Hints: Note that $Ax_i=Ax_j$ if and only if $A(x_i-x_j)=0$. By assumption $x_i-x_j\neq 0$ for $i\neq j$. So you can reduce your problem to determining the probability $Az=0$ for any $z\neq 0$. Can you show that each component of $Az$ is uniformly random in $\{0,1\}$, and that the components are independent?

If you can, this will imply the probability $Ax_i=Ax_j$ is $1/2^p$. By linearity of expectation, this means the expected number of collisions among all the $x_i$ is ${m \choose 2} 2^{-p}\leq \frac{m^2}{2\cdot 2^p}$. What happens when you plug in $p=2\log_2 m$? Why does this imply what you want about the probability of a collision?