i've got a problem that feels very much like it's NP-hard but I would love some help proving it primarily.
Secondary to that, if an optimal polynomal time algorithm can be proposed that is even better, although even good "greedy" approximations are fine.
Given $M$ marbles, $B$ buckets and each bucket having capacity $K$, where $M = BK$ , find some distribution of the marbles that maximizes the number of colours that are shared. colour $A$ and colour $B$ are considered to be shared if they appear together in the same bucket.
The distribution of colours is known for a given problem, and can be sorted and added in any order you desire. We can define a distribution as follows:
$m_1 + m_2 + m_3 + \dots + m_n = M = BK$
where $n$ is the the number of colours. We also have $0 < m_k \le M = BK$ for some $k$th colour
Example:
| BucketId | Colours |
|---|---|
| $0$ | $A, C$ |
| $1$ | $A, C$ |
| $2$ | $B, D$ |
| $3$ | $B, D$ |
$M = 8, B = 4, K = 2 \\m_1 = 2, m_2 = 2, m_3 = 2, m_4 = 2$
Clearly, this is non-optimal since we may swap $C$ in bucket 1, with $D$ in bucket 2 and achieve a better outcome (more colours shared).
Any insight into this problem would be infinitely appreciated, Thanks :)
Thought I might answer my own question, as to the best hueristic I have found so far. Certainly is not optimal, and I still haven't found any way to reason whether this problem is NP-hard or not.
This just seems to be an approach that is consistently giving close to optimum. I have not been able to prove an approximation ratio either though. The algorithm in itself is very simple, and has out-performed other more complicated ones I have tried.
Start by ordering all $n$ colours such that $m_k \ge m_{k+1}$
I now take each marble in the list and add it to the bucket with the fewest marbles. Taking the lower indexed bucket in a tie-breaker scenario. This is effectively doing some round-robin across the buckets. This might give us something like
$M = 10, B = 5, K = 2 \\m_1 = 3, m_2 = 2, m_3 = 2, m_4 = 2, m_5 = 1$
The idea now is simple, for each entry in the table, where $B_i \ne B_j$ (colours in different buckets): We swap them, if more colours are shared, we have achieved a better table. Save this table and repeat the process. If they do not increase sharing, revert the swap, and try a different valid pairing.
Keep doing so until no more colours may be shared. An example swap above might produce:
Which takes us from 4 colours shared to 5