Calculating cumulative error in distribution of elements

1.9k Views Asked by At

I am dealing with a real life situation where I have a distribution of 12 unique elements that can be permutated in any way possible (so 12! = 479,001,600 possible permutations). The index position of each element has a fixed measure and each element has a dimension that fits that measure but will almost never fit 100%, IOW there will always be some error that must be held within a threshold but even when it is within the threshold it is best to minimize the error.

Think of having 12 children of similar size and 12 pairs of differently sized shoes. The ideal would be to assign a pair to each child so the shoes fit but if that's not possible, it's okay if a couple of children are off by half a size each. You also want to avoid a scenario of an outlier whereby 11 children are fitting almost perfectly but one kid is off by two sizes (hypothetically, because two sizes would be more than the allowed threshold). It is better to have each kid off by half a size even though the cumulative error (all added together) is less than two sizes because of the drastic deviation.

I need to come up with a measurement that will reveal the most optimal distribution (permutation) of elements with regard to this error. Ideally, if each error were 0, the cumulative measurement for the set would be 0 (perfect). But inevitably there will be some errors. As explained in the paragraph above, just summing each individual error would not suffice because if 11 elements had 0 error but 1 had one that is just under the threshold would be disproportionately throwing off the balance.

I was thinking of standard deviation to represent how optimal each permutation is with regard to error. Is this a good way to approach it or should I use something else ?

1

There are 1 best solutions below

2
On

Problems like this all come down to a choice of error function. Conventional choice would be least squares (in each slot measure the distance between actual and predicted and square, then sum over the slots). This tends to give plausible results in a variety of situations, and it has the huge advantage of being computationally fairly tractable. Try it! It is a theorem of Gauss that the least squares fit minimizes error variance (in a fairly broad class of situations) so it may be what you are looking for.

If you find you don't care for the results you can try other measures for error...perhaps a function which grows faster than $(distance)^2$. That will tend to kill outliers for sure, but the calculation will be harder.