Suppose we have two multisets of positive integers, $A$ and $B$, where the sum of the elements (counted with multiplicity) of the two multisets is the same. Starting from $A$, we would like to arrive at $B$ by applying one of two rules in turn. On a turn, we may either:
- Choose an element, $x$, of $A$ and two smaller integers $y$ and $z$ such that $x = y+z$. Decrease the multiplicity of $x$ by $1$ and increase the multiplicity of $y$ and $z$ by 1 each (increasing the relevant multiplicity by $2$ if $y=z$). There is no requirement for $y$ or $z$ to have been elements of $A$ prior to this. Applying this move incurs a one point penalty.
or
- Choose two elements, $y$ and $z$, of $A$, reduce each of their multiplicities by $1$ (by $2$ if $y=z$ but we can only choose two equal elements if the multiplicity was at least $2$ to begin with) and increase the multiplicity of $y+z$ by $1$. This move can be used even if $y+z$ was not an element of $A$. Applying this move incurs no penalty.
The resulting multiset, $A'$, will then be our new "$A$" for the next turn and the sum of its elements (counted with multiplicity) will be the same as it was for $A$. We would like to use these rules to go from $A$ to $B$. For instance, suppose $A$ is $\left\{2, 5, 7\right\}$ and $B$ is $\left\{1, 2, 3, 3, 5\right\}$. We could use move 1 to go from $A$ to $\left\{1, 2, 5, 6\right\}$ and then use it again to go to $\left\{1, 2, 3, 3, 5\right\}$, incurring a total of two points.
My Question
What would be a computationally efficient algorithm for going from $A$ to $B$ while incurring the smallest possible number of points, i.e. using move $1$ as few times as possible? Is it possible to construct a much more efficient algorithm if we are happy to just be close to the minimum number of penalty points in some sense, perhaps even an informal sense of just having a "sensible" algorithm for keeping the number of points incurred small?
My Thoughts so Far
First of all, there's no benefit to applying moves of type $2$ until we have done all of our type $1$ moves. Secondly, if $A$ and $B$ share some elements, there will always be an optimal solution which just preserves them, i.e. at no point reduces their multiplicity in our "$A$" beyond their multiplicity in $B$.
If $A$ has $a$ elements counted with multiplicity and $B$ has $b$ elements counted with multiplicity, then we will need to incur at least $\max{\left(0, b-a\right)}$ penalty points because we will need to "create" at least that many new elements by using a type $1$ move. We will never need to incur more than $b-1$ points because we could always just use move $2$ to merge all of $A$'s elements together and then use move $1$ $b-1$ times to peel off the elements of $B$.
One way of thinking about the number of points incurred that might be helpful is to keep track of how many elements of $B$ each element of $A$ "contributes" to forming. For instance, if there is a $5$ in $A$ and we use rule $1$ to split it into a $2$, which is matched with a $2$ in $B$, and a $3$ which is combined with a $4$ from somewhere else using rule $2$ to form a $7$ that is matched with a different element of $B$, then we can say that the $5$ from $A$ contributed to two elements of $B$, the $2$ and the $7$. Every element of $B$ that a given element of $A$ contributes is another point incurred so we would like elements of $A$ to contribute to as few elements of $B$ as possible on average.
The problem is NP-hard, by reduction from the partition problem.
Let $A$ be any multiset, whose sum (counting multiplicity) is $s$. Let $B=\{s/2,s/2\}$ be the multiset containing two copies of $s/2$. Then there exists a solution to transform $A$ to $B$ with no points (i.e., using only type-2 moves), iff $A$ is a "yes" instance for the partition problem.
You can also reduce from the subset sum problem, if you find that conceptually simpler.
If you need to solve it in practice, you can formulate it as an instance of integer linear programming and solve it with an off-the-shelf ILP solver. Let $A=\{a_1,\dots,a_m\}$ and $B=\{b_1,\dots,b_n\}$. Introduce integer variables $t_{i,j}$ and zero-or-one variables $s_{i,j}$, with the following intended meaning:
Via type-1 moves, we split $a_i$ into the sum $a_i=t_{i,1}+\dots+t_{i,m}$ (ignore the 0 terms in this sum).
Then, via type-2 moves, we construct $b_j$ as the sum $t_{1,j} + \dots+t_{n,j}$.
Therefore, $t_{i,j}$ contains the contribution from $a_i$ that flows to $b_j$.
If $s_{i,j}=0$, then $t_{i,j}=0$.
Therefore, $s_{i,1}+\dots+s_{i,n}-1$ counts the number of type-1 moves applied to $a_i$.
Given these ideas, we can enforce them using integer linear inequalities, as follows:
Finally, we minimize the sum $\sum_{i,j} s_{i,j} - m$, as that counts the number of type-1 moves needed.