Squared multiplicty minimization in multiple set choosing problem

44 Views Asked by At

The problem is the IOI 2007 competition's Sails problem. It has also appeared in Pranav A. Sriram Olympiad Combinatorics notes (Chapter 1 Exercise 14). I copied the problem description from Pranav. You are given $n$ integers $a_1, a_2, \dots, a_n$ and another set of n integers $b_1, b_2, \dots, b_n$ such that for each $i, b_i \leq a_i$. For each $i = 1, 2, \dots, n$, you must choose a set of $b_i$ distinct integers from the set $\{1, 2, \dots, a_i\}$. In total, $(b_1 + b_2 + \dots + b_n)$ integers are selected, but not all of these are distinct. Suppose $k$ distinct integers have been selected, with multiplicities $c_1, c_2, c_3, \dots, c_k$. Your score is defined as $\sum_{i=1}^{k} c_i(c_i-1)$ . Give an efficient algorithm to select numbers in order to minimize your score.
The greedy algorithm is that you sort by $a_i$ into increasing order. After this you go through one by one and for a given $i$ you choose the set of numbers from $\{1, \dots, a_i\}$ so that the increase in the objective function is minimal.
Now the problem I have is the following. I understand the greedy algorithm. I also understand why it is locally optimal (trivial), but I have been unable to prove for the past $3$ days that the greedy algorithm is optimal. I have searched the internet for a proof but did not find one. I have also given this problem to everybody I know, but no solution has been shown to me so far. I hope somebody here can prove that the greedy algorithm is optimal.

I have tried the following things among many others.
Going from an easier problem to this problem, like starting from this problem. $$ \begin{equation*} \begin{array}{ll@{}ll} \text{minimize} & \displaystyle\sum\limits_{i=1}^{k} c^2_i &\\ \text{subject to}& \displaystyle\sum\limits_{i=1}^{k} c_i = \displaystyle\sum\limits_{j=1}^{N} b_j \\ &c_i \in \mathbb{R_+} \\ &b_j \geq 0 \end{array} \end{equation*} $$ Here you can show that the optimal solution is when $c_i=c_j$. Then if you exchange the condition of $c_i \in \mathbb{R_+}$ to $c_i \in \mathbb{N} = \{1,2,\dots\}$ then your optimal solution need to have the following property $\forall i,j$ $|c_i-c_j| \leq 1$. After this, I was able to show that for optimal and greedy solutions it is true that

If $\exists i,j$ such that $c_i>c_j+1$ and $\exists l$ such that $\max(i,j) \leq a_l$ and $i \in S_l, j \notin S_l$ then we can exchange a sail on the $l$th mast between levels of $i$ and $j$ such that the objective value strictly decreased.

But one can show with an example that this does not imply optimality. I have also tried induction and that also failed. Other exchange-like arguments failed also.