Permutation Metric that does not penalize heavily switches among contiguous chunks

87 Views Asked by At

Consider y, z - two permuations of the identity permutation x

x = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)


y = (7, 8, 9, 4, 5, 6, 0, 1, 2, 3)
z = (1, 0, 3, 2, 5, 4, 7, 6, 9, 8)

y swaps between 3 contigous sub-sequences.

z makes 5 local adjacent swaps.

What permuation metric M is there that satisfies

M(x, y) < M(x,z) ?

I wish such metric to emphasize the fact that y keeps longer contigous substrings of x.

In that sense Kendall's $\tau$ is closer to what I wish than Spearman's $\rho$.

However, both metrics penalize more heavily the distance between x and y, than the distance between x and z:

$\rho(x,z) < \tau(x,z) < \tau(x,y) < \rho(x, y)$

(assume the symbols $\rho$ and $\tau$ are the metrics derived from Spearman's and Kendall's coefficeints)

What metric M would penalize more on many local changes than on just a few swaps of long contigous chunks?

Or in the example above: what metric would guarantee M(x, y) < M(x,z) ?

1

There are 1 best solutions below

0
On

What would you want for metric to say for

x = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)


y = (7, 8, 9, 4, 5, 6, 0, 1, 2, 3)
w = (4, 5, 6, 0, 1, 2, 3, 7, 8, 9)

I got the impression that you want $M(x, y) = M(x, w)$. In that case, using correlation coefficients isn't going to work because you are not interested in correlation. I don't know for any common metric that catches the property that you are looking at, but why not trying coming up with something. Maybe you could calculate lengths of contiguous chunks, so in this example you would have for $x$ and $y$ lengths of $(3, 3, 4)$, and then give $M(x, y) = \frac{1}{3} + \frac{1}{3} + \frac{1}{4}$, and the same for $x$ and $w$. For $z$ in your example you would have $M(x, z) = 9$.

If, on the other hand, you want $M(x, y) < M(x, w)$ for some reason (or $>$), what is the reason you want that for? Maybe you could try some linear combination of correlation coefficient and the metric I suggested above?

My suggestion is, generate some fair amount of examples and try to fit your metric manually to them. You could try using $f(a_1, ..., a_k)$ = $\frac{1}{a_1^s} + ... + \frac{1}{a_k^s}$, where $a_i$-s are lengths of contiguous chunks, and then fit $s$ so that it behaves well on your examples. My above example has $s = 1$, but maybe you don't want that in some cases. Say

x = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)

y = (14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1)
z = (15, 14, 13, 12, 11, 10, 9, 6, 7, 8, 3, 4, 5, 0, 1, 2)

for $s = 1$ would give $M(x, y) = M(x, z) = 8$. Maybe you want to penalise number of chunks, so you could have $f(a_1, ..., a_k)$ = $\frac{1}{a_1^s} + ... + \frac{1}{a_k^s} + k^p$ and then optimise for $s$ and $p$.

It is hard to say more without knowing what you want from your metric on some more examples. So, I suggest investigating what you want and then doing some sort of manual fitting to your examples. In case you can't describe to yourself why do you want $M(x, y) < M(x, z)$ for some examples, but know that you want it, maybe you could even try doing some machine learning, but then you would need more examples, and I wouldn't suggest it unless you find it necessary.