Bound maximal probability in urn model

21 Views Asked by At

Let's say we draw elements from a list with the following distribution count: $$\{x: 10, y: 2, z: 3, w: 15\}.$$ We draw a single element from that list.

How can we adjust the counts of the elements such that the probability of drawing a class is bounded from above, e.g., $1/3$. For example, $5$ elements of class $w$ should be removed, but how should we distribute the missing elements on the other elements? We cannot add them to class $x$ since otherwise $x$ would have a probability higher than $1/3$. Is there a general rule to adjust the probabilities?

1

There are 1 best solutions below

0
On

I found the answer myself: Sort the elements in decreasing order by their counts. If count/overall_count is larger than the maximal probability, set it to the maximal probability and evenly distribute the difference to the remaining elements. Continue in this manner until all probabilities are smaller than the bound.

Here's a python snippet:

def _random_sample(c: Counter, size: int=1, max_proportion: float=0.4):
overall_freq = sum(c.values())
if 1/max_proportion > len(c):
    return list(np.random.choice(list(c.keys()), size=size, replace=False, p=[c[key]/overall_freq for key in c]))
n = len(c)
sorted_c = sorted(c.items(), key=operator.itemgetter(1), reverse=True)
probas: List[Tuple[Any, float]] = [(x[0], x[1]/overall_freq) for x in sorted_c]
for i, x in enumerate(probas):
    if x[1] > max_proportion:
        # distribute the probas
        probas[i] = (x[0], max_proportion)
        diff = x[1] - max_proportion
        for j in range((i+1), n):
            probas[j] = (probas[j][0], probas[j][1] + diff/(n-1-i))
    else:
        break
return list(np.random.choice([x[0] for x in probas], size=size, replace=False, p=[x[1] for x in probas]))