I've heard of some probability questions related to collision probabilities and hash functions and want to generalize a simple problem. So consider my variation:
Say you have:
- x, which is a binary vector of length N
- A, which is a P x N dimensional random binary matrix, where each element is either 0 or 1, drawn independently from a Bernoulli distribution
- b, which is a length P vector, which you get from the left matrix multiplication as: b=Ax, after which you do mod 2 arithmetic to make b a binary length P vector. In other words, element i of b is M mod 2, where M is the inner product: dot(ith row of A, vector x).
- say P < N so there is dimensionality reduction going on which is what makes the collision problem interesting.
What is the collision probability for two arbitrary vectors x1 and x2, where the probability is over all possible 2^(PN) completions of the matrix A? I know the result for the case when the elements of matrix A are uniformly likely to be either 0 or 1, but I want to generalize this to the case where each element is generated according to some Bernoulli distribution with known but arbitrary p.
Note $Ax_1=Ax_2\pmod2$ if and only if $A(x_1+x_2)=0\pmod2$, so your question is equivalent to asking what the probability that $Ax=0$ is, where $x=x_1+x_2\pmod2$.
Suppose $x$ has $m$ nonzero entries, that is, suppose that $x_1$ and $x_2$ differ in $m$ places. The probability that the $i^\text{th}$ entry of $Ax$ is nonzero is equal to the probability that a random variable with distribution Binomial$(m,p)$ is even. Citing this other question, this probability is $\frac12(1+(1-2p)^m)$. Since the entries of $Ax$ are independent, the probability of a collision is $$ \left(\frac12(1+(1-2p)^m)\right)^P $$