Multiset Collision Probability Generalization

203 Views Asked by At

I'm trying to generalize the collision probability problem. So let's say:

There are N non-empty multisets: $X_1, X_2,…,X_N$

Each multiset, $X_i$, has $|X_i|$ elements in it, and each element is one of $T$ discrete values {1,2,…,T}

Assume uniform probabilities for all elements, for all multisets.

So we can think of it like a checkers board with stacks of indistinguishable pieces on it, e.g.:

enter image description here

For a given multiset, $X_i$ (row), a space can have anywhere from $0$ balls to $|X_i|$ balls (all the balls for that multiset are on that one space).


Parameterize a “collision” on this set of multisets (this checkerboard) by 3 parameters: K, M, J:

  • K is related to the number of multisets (out of the N total multisets) that must satisfy some condition on the number of elements that are shared.
  • M is related to the multiplicity of the elements that are shared between multisets
  • J is related to the number of unique elements (out of the T total values) that must be shared between multisets

So define a “collision” as:

Exactly K of the N multisets satisfy the sharing condition on exactly J unique elements, where “satisfy the sharing condition” means: for each of the J unique elements, all of the K multisets have multiplicity of exactly M on that unique element


EXAMPLE:

Say we use the collision parameters (K=2, M=4, J=3), for a setup with T=7, and N=5 multisets with the below arbitrary number of balls (cardinalities) in the multisetes:

enter image description here

So in this case, the set of multisets {$X_1,X_2,X_3,X_4,X_5$} does have a collision because for exactly K=2 of the multisets ($X_3$ and $X_5$), there are exactly J=3 values (2, 4, and 6) that satisfy the element sharing condition (they have multiplicity M=4).

But for example, $X_1$ does not collide with $X_3$ or $X_5$ because it does not satisfy the J=3 condition: even though on the values 4 and 6 it also has the correct multiplicity M=4 and so it “shares” those values with $X_3$ and $X_5$, it does not collide because $X_1$ does not have the correct multiplicity on the value 2 (it has multiplicity M=1 instead of 4). So, $X_1$ is not considered to collide, so the J=3 condition is not met, and this checkerboard (set of multisets) is NOT considered to be a collision. On the other hand, if $X_1$ had multiplicity M=4 instead of M=1 on the value 2, then the checkerboard would be counted as a collision.


Objective: So, what I’m trying to derive is an expression for the probability of collision, parameterized by (K,M,J), for any given set of multisets {$X_1,X_2,...,X_N$} with cardinalities {$|X_1|,|X_2|,...,|X_N|$}, for the case of uniform probabilities (all elements are equally likely for all multisets).


I think the way to go is to use Probability = # valid configurations / #total configurations.

So for the denominator, the total number of arrangements for a given multiset $X_i$ is independent of the total number of arrangements for a different multiset $X_j$, so the overall total number of configurations is the product of the number of configurations for each of the $X_1, X_2, …, X_N$. For a given $X_i$, the number of arrangements it can have is a stars and bars multichoose problem because it is like the number of ways of partitioning $|X_i|$ indistinguishable balls into T distinguishable bins. So the total number of arrangements for multiset $X_i$ with cardinality $|X_i|$ is:

$$N(|X_i|) = {{T-1+|X_i|}\choose{T-1}}$$

So then the denominator #total configurations is:

$$\prod_{i=1}^{i=N} {{T-1+|X_i|}\choose{T-1}} = \frac{\prod_{i=1}^{i=N}(T-1+|X_i|)}{\left((T-1)!\right)^N \prod_{i=1}^{i=N}(|X_i|!)}$$

So the remaining part is more complicated. The numerator needs to count the number of colliding configurations, i.e. those configurations that are considered collisions based on the definition above.


Bounty: Awarding a bounty for a thorough derivation of a correct expression for the probability of collision, parameterized by (K,M,J), for any given set of multisets {$X_1,X_2,...,X_N$} with cardinalities {$|X_1|,|X_2|,...,|X_N|$}, for the case of uniform probabilities (all elements are equally likely for all multisets).