Lower Bound on Elements in Common After Relabeling

47 Views Asked by At

We have two lists of length $N$, $L_1$ and $L_2$. Each list consists of elements $\{1,\ldots,G\}$, where $G\ll N$. Denote the fraction of matching elements as $F(L_1,L_2)\equiv\frac{1}{N}\sum_k 1(L_1(k)=L_2(k))$.

Now fix the values of $L_1$, but allow permutations of the labels in $L_2$, e.g., $\{1,2,\ldots,8\}$ to $\{5,8,\ldots,3\}$. Define the maximal proportion matched across all permutations as $M(L_1,L_2)\equiv\arg\max_{P(G)} F(L_1,L_2(P(G))$.

Given $L_1$, and searching over $L_2$, what is the greatest lower bound on the maximal proportion $M$?

EDIT: As an example, suppose $L_1$ were {1,1,...,1} and $L_2$ were {2,2,...,2}. We would relabel 2 as 1 to get $L_2'$ as {1,1,...,1} and a match for every element. So, the problem can be understood as, given a list $L_1$, produce the adversarial list $L_2$ with the least number of matchings for the most favorable cipher (rearrangements of the alphabet $\{1,\ldots,G\}$).

If there is no closed-form solution, is there an algorithm to efficiently obtain this lower bound given $L_1$ and $G$, that is, one that does not require a search space that grows exponentially in $N$?

1

There are 1 best solutions below

6
On

Your lists are the same as words of length N from the alphabet $\{1,2,\cdots, G\}$.

Understanding "in common" as "matching (i.e. same value and same position)", then if $L_2$ has $m$ matchings and $q$ characters in common ( independently from the position) with $L_1$, then it is clear that $m \le q$, and that it is always possible to permute $L_2$ as to transform all $q$ into matchings.

As to the probability of having $m$ matchings or $q$ characters in common between two lists (words) of length $n$, I just replied to a similar question: although the probabilities given there is for not having any matching/common character, the Premise therein is what you need to start with. In fact, it is clear that the maximum of the matchings equals the characters that the two strings have in common, each repeated with the lowest repetition found in each string.
Then the minimum of such maximum of matching is just $0$ : the two words do not have any character in common , although derived from the same alphabet. And, assuming the characters have the same probability to appear, the probability of having that is: $$ Q_{\,a} (G,N) = {1 \over {G^{\,2N} }}\sum\limits_{1\, \le \,\,k\, \le \,G} {\;\left( \matrix{ G \cr k \cr} \right)\left( {\,G - k} \right)^{\,N} } k!\left\{ \matrix{ N \cr k \cr} \right\} $$

And it seems that what you need is an asymptotic expansion of it for $N \to \infty$: is that right ?