Excuse the non-mathematical way I've phrased the question.
I have the following problem:
I have $N$ square paper documents with side lengths between $150$mm and $860$mm. I know each document side's length.
I need to create $3-4$ differently sized boxes to fit all the documents, e.g. Three box types: Box $1$ side $l_1=300$mm, Box $2$ side $l_2=600$mm, Box $3$: $l_3=860$mm.
There are as many boxes as documents, i.e. each document goes into its own separate box (of the smallest possible size so as to minimize waste of cardboard).
What is the best way to decide on the size of the boxes, so as to minimize the total amount of (surface area) of cardboard used?
I'm not necessarily looking for the analytical solution to this problem. Two ideas I've had:
a) Pick $l_1$ and $l_2$ values at random and calculate the total surface area of cardboard. Guess the values again, see if the total surface area is smaller, and so on and on.
b) A more analytical approach where I'm computing $l_1$ and $l_2$ value in say $1$mm increments and I calculate the total surface area for each combination of box lengths between say ($150$mm, $151$nmm,$860$mm) and ($858$mm,$859$mm,$860$mm).
What would you suggest is the most practical way of going about solving this?
BTW, I'm great with Excel, less so with Mathlab, etc. I can program well in Ruby if that helps in any way.
I have modeled this as a set covering problem, similar as in link.
The first step was to condense the data, by only storing the unique values and their count. This reduces the data from 1166 to 384 records. Make sure they are ordered by increasing size.
Then I define a binary variable
$$x_{i,j} = \begin{cases}1 & \text{if we put items $i$ through $j$ in the same box sizes $j$}\\ 0 & \text{otherwise}\end{cases}$$
Every item $k$ must be covered exactly once by a variable $x_{i,j}$. This leads to:
$$\sum_{i \le k \le j} x_{i,j} = 1 \>\>\text{ for all items $k$}$$
We need to cover exactly once, so this is sometimes called a set partitioning problem.
We have 4 different box sizes, so:
$$\sum_{i \le j} x_{i,j} = 4$$
The objective is
$$\min \sum_{i \le j} c_{i,j} x_{i,j}$$
where $c_{i,j}$ is the total area, i.e.
$$ c_{i,j} = \sum_{i \le k \le j} \mathit{count}_k \mathit{area}_j $$
This leads to a very large but easy MIP (Mixed Integer Programming) model. Its size is:
It takes about a minute to solve (on a slow laptop). The results look like:
This is a proven optimal solution.
For 3 different boxes I get:
As expected the objective deteriorates: we waste more space. Interestingly the smallest boxes are the same as for the previous case.
Alternative methods includes a network approach (see the answer by Paul Rubin) or a Dynamic Programming algorithm. Somehow I like the set partitioning model: the constraints can be written very compactly (and actually make intuitive sense).