Sort sequences of the form ABCABCABC, ABABAB, ABCDABCD, etc.

363 Views Asked by At

You are given a sequence of distinct elements $P$ and a positive integer $R$. Consider the sequence obtained by repeating $P$ $R$ times and concatenating together. For example, if $P = \langle A, B\rangle$ and $R = 3$, then the resulting sequence $P^R$ is $\langle A, B, A, B, A, B \rangle$.

We would like to transform this sequence into another sequence such that identical elements are now adjacent to each other (more formally, for a sequence $S$, we would like to find a permutation $S'$ of $S$ such that for any two $S'_i$ and $S'_j$ such that $S'_i = S'_j$, for any integer $i \le k \le j$ we must have $S'_i = S'_k = S'_j$). The only operation we can perform on this sequence is swapping two elements.

For example, with $P = \langle A, B \rangle$, let $S = P^3 = \langle A, B, A, B, A, B \rangle$. Consider $S' = \langle A, A, A, B, B, B \rangle$. We can transform $S$ to $S'$ by swapping $S_2$ and $S_5$, thus, we can obtain $S'$ from $S$ by using a single swap. We can also consider $S' = \langle B, B, B, A, A, A \rangle$, in which case we need two swaps.

Is there an algorithm to do this? How should this be done if we want the minimum number of swaps?

2

There are 2 best solutions below

0
On BEST ANSWER

Disclaimer: This solution is provably optimal with the assumption that the function we want to minimize is the number of elements that have to be swapped. I think this should also be optimal for the other case, but I do not give such proof.

We denote by "displaced element" the elements of $S$ that is different than the element of $S'$ in the same location (that is, the element that need to be swapped). If $R$ divides $|P|$, then we can use any possible $S'$ and we will have the same number of displaced elements. Thus, we consider the case where $R \not = P$.

We will first consider the case where $|P|$ and $R$ are coprime to each other. $S'$ consists of $|P|$ group of $R$ consecutive characters, one for each element of $P$. Let the element of the $i$-th group of consececutive characters in $S'$ be noted by $G_{S'}(i)$ (for example, if $S' = \langle A, A, B, B \rangle$, then $G_{S'}(1) = A$ and $G_{S'}(2) = B$).

Now, consider $S'$ obtained by setting $G_{S'}(i)$ to the $(((i-1)R \mod |P|) + 1)$-th character in $P$. Since $|P|$ and $R$ are coprime to each other, $1 \le R \mod |P| < |P|$, and thus all the elements of $G_{S'}$ are unique. In particular, each character in $P$ appears exactly once. Thus, $S'$ is a valid transformed sequence.

Now we claim that this $S'$ consist of a minimum number of displaced elements. We show this by showing that for each group in $S'$, there exists exactly $\lceil R / |P| \rceil$ elements that are not displaced (note that this is the maximum possible such elements for each group, thus this is optimal). The index of the first character of the $i$-th group is $(i-1) R + 1$. However, the $(i \cdot R + 1)$-th character of $P^R$ is exactly equal to $G_{S'}(i)$ and thus the claim holds.

Now suppose $R$ and $|P|$ are not coprime to each other. Let $x = \gcd(R, |P|)$ and $y = \text{lcm}(R, |P|)$. We now set $G_{S'}(i)$ to the $(x((i-1)(R/x) \mod (|P|/x)) + \lfloor (i-1) / y \rfloor + 1)$-th character of $P$. This is also optimal and have the same guarantees as the case where $R$ and $|P|$ are coprime.

1
On

Let's say that your sequence is given as array $A$ and $A[i]$, $1\leq i\leq PR$, $i$-th symbol in $A$.

If you want to sort $A$, then elements of $A$ have to be comparable, i. e. given two elements $x$ and $y$ of $A$, you have to be able to determine whether $x$ is greater than $y$, $y$ is greater than $x$, or $x$ equals $y$.

Given only a sequence hot, cold, hot, cold we can argue which of cases hot $<$ cold, hot $>$ cold, hot $=$ cold holds. Skier would go for the first, swimmer for the second and a stone (which doesn't care of temperatures) would go for the third case, hence the algorithm itself cannot know how to compare these elements.

If you know, how to compare the elements of $A$, proceed as follows:

  1. Sort the subsequence $A[1]$, $\dots$, $A[P]$ and save the result into array $B$
  2. Define array $C$ of length $PR$.
  3. for j = 0, 1, ... , R - 1 repeat: for i = 1, 2, ... P repeat: C[P * j + i] = B[i]
  4. Return $C$

This is algorithm with running time $\mathcal{O}(PR\log P)$, which is faster than sorting the whole given array $A$ (which would take $\mathcal{O}(PR\log (PR))$ time).