Inspired by a word game Waffle, see footnotes if interested. The abstracted problem:
You're given an input array of letters, some of which might be identical (i.e. repeated), e.g. GSAAKD. You are also given a desired outcome, which is a second array containing the same multiset of letters, e.g. ASKGAD. The challenge is to make the minimum no. of swaps to turn the first array into the second array.
In this example:
The shortest path is
gSAaKD -> ASaGkD -> ASKGADof $2$ total swaps.The first letter in the desired outcome is
A, but it makes a difference whichAyou move there. E.g. if your first move isgSaADK -> ASGAKDthen you cannot finish in $1$ more swap ($2$ total swaps).
Related / special case: If the letters are all distinct, then this is the well known problem of minimum no. of swaps to turn one permutation into another. Can someone confirm what I remember about the solution in this case?
The min. no. of swaps $= N -$ no. of cycles
A greedy algorithm where each step you put one letter into its correct position achieves the minimum.
Questions: Bounty will be awarded if you answer either Q1 or Q2. If one person answers Q1 and another person answers Q2, I will award the full bounty to each. :)
Q1: characterization: What is a good way to characterize the answer? Are cycles even well-defined when there are repeats? (I'm looking for a characterization beyond the obvious one where we consider all possible mappings of repeated letters to their correct positions and just apply the permutation solution to each.)
Q2: greedy algorithm:
First let me propose the following greedy algorithm: At every step, make a move s.t. either (A) or (B) happens (or both):
(A) a non-repeated letter gets to its (unique) correct position,
(B) both swapped letters end up at correct positions (in this case, a swapped letter might be one of a repeated set, and it ends in one of its correct positions in the desired outcome).
The greedy algorithm as specified above is "incomplete" in the sense that one may be unable to find a move that meets (A) or (B). However, suppose there is such a sequence of moves (where every step meets (A) or (B)) resulting in the desired outcome. Is such a sequence guaranteed to achieve the minimum? Alternatively, can you devise a different greedy algorithm that can reach the minimum?
Footnote on motivation: In Waffle, some letters are jumbled in a crossword grid and the challenge is to minimize no. of swaps to make all the words correct. Waffle is a weird game in that it is both a word game (anagramming) and a combinatorial game (minimizing no. of swaps). This question is about the latter, i.e. how I can minimize no. of swaps after solving the entire grid and knowing the desired outcome.
Thought I found it, but no, this algorithm just came very close.
A Failed Algorithm
Define the input to be the original string, $s$, and the desired string $e$. Assume that $s_i\neq e_i$ holds $\forall i$. Now, we want to get the shortest cycle there is in the two-line cycle notation where $s$ is written above $e$.
Start with every unique letter in $s$, call it $A$, since a cycle repeats by definition, search for the length of the shortest route to get back to $A$ (the first time you encounter it). It is clear that if this is done for every letter the shortest cycle must be found (since all start and end letters are verified). Using memoization (on starting letter and ending letter) this search can be done in $O(d^2)$. Also, note that different letters may not be visited twice as this can lead to an infinite loop (and a longer cycle than necessary).
Once the shortest cycle is found, add $len(cycle)-1$ to the result, remove all indices of the cycle, and repeat this process until $s$ and $e$ are empty.
The problem is, there can be multiple cycles with the shortest length, only the maximum disjoint set (MDS) of cycles of that length can be flipped at the same time to give some/the correct answer.
An Example
I will use "AEBCFABC" to "BCEFABCA" as an example (the answer is $6$ swaps). $$ \begin{pmatrix} A\ E\ B\ C\ F\ A\ B\ C \\ B\ C\ E\ F\ A\ B\ C\ A \end{pmatrix} $$ We will look for the shortest cycle contained above (remember that only paths with no repeated letters are denoted). We get $$ A\to B \to \{C,E\}\to A \\ B\to \{C,E\}\to \{A, F\} \to B \\ C\to \{A,F\}\to B \to C \\ E\to C\to \{A, F\} \to B \to E \\ F\to A\to B\to \{C, E\} \to F\,, $$ the shortest cycle length is thus $3$ and therefore we add $2$ swaps to the total (to correctly swap $ABC\to BCA$) and remove the $3$ cycle leaving "AEBCF" to "BCEFA".
The "proof"
Once you swap a cycle (for now a cycle of distinct letters) you know that afterwards all letters are in the correct spot. Thus if once the shortest cycle was $3$, then after swapping the shortest cycle this number can only increase, never decrease.
You always want to get the least amount of swaps per letter you place correctly, since swapping a cycle costs $n-1$ swaps the ratio of swaps to correct placements becomes $\frac{n-1}{n}$, this is naturally minimal if $n$ is minimal (also, see the comments in the OP). Therefore, if we repetitively swap the shortest cycle we end up with the best we can achieve.
Now, for duplicate letters. It is clear that in the above example instead of swapping the final $3$ letters I also could have chosen the first and the last two. We agree that swapping the shortest cycles is the best, thus now, imagine coloring all cycles which have exactly that length. If a cycle can be colored without overlap, my algorithm will detect it and correctly swap it. If there are (multiple) cycles with overlap, my original algorithm will detect only one and swap it. However, this is where the MDS needs to be used, an NP-complete problem, as many cycles as possible with the shortest length need to be flipped, a (somewhat complex) example of where my original algorithm goes wrong is "ABCDCFAGB" to "BCABDCFAG". Also, maybe a flip of a cycle of length 3 breaks a cycle of length 4 causing more problems.
Python program implementing my faulty algorithm
This program can compute an upper bound for the answer for (random) lowercase strings of length 1000 in about a second.