Suppose we have two lists of labels $L_1$ and $L_2$. As an example, I will define them as $L_1 = [a, b, a]$ and $L_2 = [a, c, a]$.
For each label in any of the two lists, I would like to count the number of matches between both lists. This should be understood as $matches(x) = min(count(L_1, x), count(L_2, x))$. This way, in the example we would have: $matches(a) = 2$, $matches(b) = 0$, $matches(c) = 0$.
The actual result I seek is the summation of the matches of all labels. In the example, the result should be $matches(a) + matches(b) + matches(c) = 2$.
However, I would like to compute that value using as input the three following matrices:
- $M_1$ is a square matrix with dimensions $length(L_1) \times length(L_1)$, where $M_1[i,j]$ is $1$ if $L_1[i] = L_1[j]$ and $0$ elsewhere;
- $M_2$ is another square matrix with dimensions $length(L_2) \times length(L_2)$ where $M_2[i,j]$ is $1$ if $L_2[i] = L_2[j]$ and $0$ elsewhere;
- $T$ is a matrix with dimensions $length(L_1) \times length(L_2)$ where $T[i, j]$ is $1$ if $L_1[i] = L_2[j]$ and $0$ elsewhere.
And furthermore, I need to reach to the result by only performing matrix multiplications, like additions, multiplications, taking the upper diagonal part of the matrix, taking the trace of a matrix, etc.
I have tried different approaches, but I realized I was not handling properly cases like the one described in the example, where there are elements that are common to both lists and repeated also within each list.
For context: this problem belongs to the natural language processing area, specifically to the count of $n$-gram matches. I want to constrain the type of computations to be able to implement them as GPU operations.
UPDATE: the order of the elements of the lists does not affect the value of $match$.