Proportional division of a number (given percentages)

33 Views Asked by At

I have clusters of a given length (minimum length = 3). I would like to distribute the points from each cluster given three percentages (e.g. train = 70%, test = 20%, validation = 10%). I would like to provide each set at least one datapoint and end up with three numbers which resemble as closely as possible the percentages stated.

So, for instance:

cluster length = 3: train: 1 test: 1 validation:1

cluster length = 7: train: 4 (floor) test: 2 validation: 1

cluster length= 10: train: 7 test: 2 validation: 1

I was wondering if there is a smooth way of doing it.

1

There are 1 best solutions below

0
On BEST ANSWER

Since the validation set is the smallest, start by determining the size of that. If you have $n$ points, then $$|\text{Validation set}| = \max\{1, \text{int}(0.1\cdot n)\}$$ where $\text{int}(x)$ is $x$ rounded to the nearest integer. Next, $$|\text{Test set}| = \max\{1,\text{int}(0.3\cdot n)\}$$ and finally $$|\text{Training set}| = n-|\text{Validation set}|-|\text{Test set}|.$$