Algorithm to solve balls in bins problem where each bin has an associated cost

131 Views Asked by At

I'm a software engineer, currently trying to solve a problem that can be modelled by the allocation of a known number of balls $n$ to a known number of bins $r$ ($0\leq n \leq 1000000$, $1 \leq r \leq 100$). The difficulty I'm having is that:

  • Each bin has an associated cost per ball, and the total cost must be kept below some known value
  • I need to maximise the 'spread' of the balls across bins as much as possible.

My original idea was to formulate the problem as an integer programming problem with

  • constraints encoding the maximum overall cost and the number of balls
  • an objective function of:

$$ \sum^r_{i=0}a\ln(x_i+1) $$

where $x_i$ is the number of balls in the bin $i$ and $a$ is some constant chosen to keep numbers reasonable. The intention here is that a log function, with its decreasing gradient, 'rewards' each new ball in a bin less, and therefore encourages the algorithm to add balls more to emptier bins. This function isn't set in stone, though - anything with similar properties will do!

The difficulty I'm having is that I can't figure out how to solve this problem. Obviously, there is theoretically a polynomial-time solution (a trivial example being "enumerate all possible points") but this is intractable in the real world as I'm working in tens of sizable dimensions.

My most recent thinking has been that a log function is easily differentiable, so something like a hill-climbing approach might work, but I'm unsure how to apply that to an integer problem space, and I'm running out of ideas on what to Google. I've also already come across this question, which seems to be asking about a similar problem, but I'm not sure it's that applicable to my situation.

TIA!

2

There are 2 best solutions below

2
On BEST ANSWER

So I've written a solution based on @Karl's first comment where I generate a uniform solution and evenly reallocate balls from above-cost to below-cost bins, and it takes ~$40$ms for a problem size of ~$74000$ balls and $13$ bins, which is perfectly acceptable for my needs!

The solution this generates is essentially:

  • Bins below the target cost all have (pretty much) the same number
  • Bins above the target cost all have another (smaller) number

I'm satisfied with this solution, so I'm going to mark this message as the answer for my particular question (unless any even smarter ideas appear).

0
On

To maximize the "spread" of the balls across the bins I'd suggest minimizing the difference between number of balls in the most populous bin and the number in the least populous bin, subject to the other constraints. The problem is then the following integer linear programming problem: \begin{align} \text{minimize}&\hspace{3em}m_1-m_0\\ \text{subject to}&\hspace{1em}\cases{x_i-m_1&$\le0\hspace{2em}\text{for}\hspace{1em}1\le i\le r-1$\\ m_0-x_i&$\le0\hspace{2em}\text{for}\hspace{1em}1\le i\le r-1$\\ \sum_\limits{i=1}^{r-1}x_i+m_1&$\ge n\hspace{1.5em}\text{(i.e.}\hspace{1em}x_r-m_1\le 0\ )$\\ \sum_\limits{i=1}^{r-1}x_i+m_0&$\le n\hspace{1.5em}\text{(i.e.}\hspace{1em}m_0-x_r\le 0\ )$\\ \sum_\limits{i=1}^{r-1}(c_i-c_r)x_i&$\le B-nc_r$\\ \sum_\limits{i=1}^{r-1}x_i&$\le n$\\ x_i\ge0,\ \text{integers}&$\hspace{3.5em}\text{for}\hspace{1em}1\le i\le r-1$} \end{align} where $\ c_i\ $ is the cost per ball in bin $\ i\ $, and $\ B\ $ is the budget. I have eliminated the variable $\ x_r $ by replacing it with $\ n-\sum_\limits{i=1}^{r-1}x_i\ $. The constraints on the variable $\ m_0, m_1\ $ ensure that the former is exceeded by all the variables $\ x_i\ $ (including $\ x_r\ $) and the latter exceeds them all. At optimality, the equations $\ m_0=\min_\limits{1\le i\le r}x_i\ $ and $\ m_1=\max_\limits{1\le i\le r}x_i\ $ must hold.

As the Wikipedia article cited above mentions, integer linear programming problems are known to be NP-hard in general. However, Magma's online integer linear program solver had no problem handling a problem with $13$ bins and the necessary $28$ constraints. I ran it with the following data; \begin{align} n&=74000\\ c&=(15,15,10,10,10,10,8,8,8,2,2,1,1)\\ B&=200000\ . \end{align} It returned the following solution: \begin{array}{c|ccccccccccccc} \text{bin}&1&2&3&4&5&6&7\\ \hline \text{#balls in bin}&1105&1105&1106&1105&1105&1105&1105 \end{array} \begin{array}{c|ccccccccccccc} \text{bin}&8&9&10&11&12&13\\ \hline \text{#balls in bin}&1111&1105&16012&16012&16012&16012 \end{array} for which the spread is $\ 16012-1105=14907\ $.

I've included a copy of the Magma script I used below, in case you'd like to play around with different values for the parameters. To run it, copy and paste it into the online Magma calculator and press the "submit" button.

n:=74000;
B:=200000;
c:=Matrix(Rationals(),1,13, [15,15,10,10,10,10,8,8,8,2,2,1,1]);
lhs:=ZeroMatrix(Rationals(),28,14);

rhs:=ZeroMatrix(Rationals(),28,1);
relations:=ZeroMatrix(Rationals(),28,1);
objective:=ZeroMatrix(Rationals(),1,14);
for i in [3..14] do
   lhs[i-2,1]:= 1;
   lhs[i-2,i]:= -1;
   lhs[i+10,2]:= -1;
   lhs[i+10,i]:= 1;
end for;

lhs[25,1]:=1;
lhs[26,2]:=-1;
rhs[25,1]:=n;
rhs[26,1]:=-n;
rhs[27,1]:=B-n*c[1,13];
rhs[28,1]:=n;
for i in [1..12] do
  lhs[25,i+2]:=1;
  lhs[26,i+2]:=-1;
  lhs[27,i+2]:=c[1,i]-c[1,13];
  lhs[28,i+2]:=1;
end for;

for i in [1..28] do
  relations[i,1]:=-1;
end for;
objective[1,1]:=-1;
objective[1,2]:=1;

sol:=MinimalIntegerSolution(lhs,relations,rhs,objective);
x13:=n;
for i in [1..12]do
  x13 := x13-sol[1,i+2];
end for;
print sol, x13;

bused:=x13*c[1,13];
for i in [1..12] do
  bused:=bused+sol[1,i+2]*c[1,i];
end for;
print bused;