I have a set of elements. To simplify here, these elements are $+$ and $-$.
I need to split this set into n sets keeping the proportion of $+$ and $-$. These new sets need to have similar size: their size must differ in one or zero.
For example, if I have a set with 7 $+$ and 4 $-$, and I need to split it into 5 sets. Each of these sets will have: 3 elements, 2 elements, 2 elements, 2 elements and 2 elements.
To solve this, I have done the following:
- Calculate the percentage of $+$ elements: $p = 4/11 = 0.37 => 37\%$
- Calculate the percentage of $-$ elements: $n = 7/11 = 0.63 => 63\%$
Now I have the percentages, I use the following formulas to calculate how many $+$ and $-$ elements I will have on each subset.
If $m$ is the number of elements on a subset, I calculate:
$mp = m * p$
$mn = m * n$
Where $mp$ is the number of $+$ elements, and $mn$ is the number of $-$ elements on each subset.
Those numbers always have decimals, so I need to round then.
If $mp > mn$ I do:
$mp = \lceil mp\rceil$
$mn = \lfloor mn \rfloor $
And, if $mp < mn$ I do:
$mp = \lfloor mp \rfloor $
$mn = \lceil mn\rceil$
But it doesn't work in this example, because when I do:
$mp = 2 * 0.37 = 0.74$
$mn = 2 * 0.63 = 1.26$
I get than $mn > mp$, so:
$mp = \lfloor 0.74 \rfloor = 0 $
$mn = \lceil 1.26 \rceil = 2$
I don't use any of the $+$ elements in four subsets. This is an error.
My problem is I don't know what to do with the decimals.
How can I do to keep the proportion and avoid these kind of problems?
As was mentioned in the comments, there's only so much you can achieve with discrete quantities, so you have to compromise somewhere. In general there are different ways to achieve a compromise, so you'd want to check what are the desirable properties that must be enforced. I'm not too sure whether this applies in this case though.
First an example. Say you want to play a card game, and have to deal 52 cards evenly between 5 people. Around here it is common to give one card to each player in a fixed order, until you run out of cards. That way you end up with a distribution of $11$, $11$, $10$, $10$, $10$ cards.
Now if we get back to your problem, say you have $P$ $+$ elements, $M$ $-$ elements, for a total of $T=P+M$. You want to split these into $N$ sets.
To do so, sort your elements in an array, say $A$ with:
Then for $0\le k < N$, let the $k$-th bucket/subset be: $B_k = \left\{ A\left[qN+k\right],\ q\in\mathbb Z \right\}$. Basically each bucket gets one element every $N$ elements.
With this method, you're guaranteed that:
I was too lazy to look how far/close the proportions you obtain are from/to the target proportion... but assuming all elements must be distributed, the four possible ratio look like best approximations.
On a side note, this method simply generalizes to more than two types of elements. Also if you let $\ell$ be the number of types of elements, I think this method can be implemented in $O(\ell+N)$ if you just need the quantities of each element in each set. Kinda too lazy to check that last statement.