Why is a simulation of a probability experiment off by a factor of 10?

2.3k Views Asked by At

From a university homework assignment:

There are $8$ numbered cells and $12$ indistinct balls. All $12$ balls are randomly divided between all of the $8$ cells. What is the probability that there is not a single empty cell ($i.e.$ each cell has at least $1$ ball)?

The answer is $\large\frac{\binom{11}{7}}{\binom{19}{7}}$ which is about $0.0065$. I reached this result independently, and it was confirmed by the official homework solution of the university.

A friend of mine and I independently wrote Python simulations that run the experiment many times (tested up to $1,000,000$). We used both Pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around $0.09$ which is a factor of $10$ or even a bit more off from the expected theoretical result.

Have we made some wrong assumptions? Any ideas for this discrepancy?

P.S.: Here is the Python code that I wrote, and maybe there is some faulty logic there.

def run_test():
    global count, N

    def run_experiment(n_balls, n_cells, offset):
        cells = [0] * n_cells
        # toss balls randomly to cells:
        for j in range(n_balls):
            cells[random.randrange(0, n_cells)] += 1
            # cells[int(lines[offset + j])] += 1
        cells = sorted(cells)
        # print(cells)

        # check if there is an empty cell. if so return 0, otherwise 1:
        if cells[0] == 0:
            return 0
        return 1

    count = 0
    N = 1000000
    offset = 0
    N_CELLS = 8
    N_BALLS = 12
    # iterate experiment
    for i in range(N):
        result = run_experiment(N_BALLS, N_CELLS, offset=offset)
        count += result
        offset += N_CELLS

    print("probability:", count, "/", N, "(~", count / N, ")")
3

There are 3 best solutions below

24
On

In reality, you will find it very difficult to put the balls in the cells without distinguishing between the balls, especially if you want equal probabilities so as to use counting methods for simulation. Suppose you wanted to consider the probability all the balls went into the first cell: with distinguishable balls this probability is $\frac1{8^{12}}$ and is easily simulated though a rare occurrence; with indistinguishable balls it is $\frac1{19 \choose 7}$ over a million times more likely but difficult to simulate

If the balls are distinguishable, the probability all eight boxes are full is $$\frac{8! \, S_2(12,8)}{8^{12}}$$ where $S_2(n,k)$ is a Stirling number of the second kind and $S_2(12,8)=159027$. That gives a probability that each cell has at least one ball of about $0.0933$. Is this similar to your simulation?

If you really want to simulate the indistinguishable balls case, despite it not being realistic physically outside Bose–Einstein condensate at temperatures close to absolute zero, you could use a stars and bars analogy. Choose $7$ distinct positions for the cells walls from possible positions $\{0,1,2,3,\ldots,18\}$ for the balls and cell walls; a success is when none of the cell walls are at positions $0$ or $18$ and no pair of them are consecutive

0
On

Consider the set $D$ of ways to distribute $12$ balls labelled [abcdefghijkl] among $8$ cells numbered [01234567]. This set has $8^{12}\approx7\times10^{10}$ elements.

Now consider the set $I$ of distinguishable ways to populate those same $8$ cells [01234567] with $12$ indistinct balls. This set has ${19\choose7}\approx 5\times10^4$ elements.

The assignment asks you to compute a probability of an event over the uniform distribution on $I$, if not in so many words. In principle, you could approximate this probability by sampling from the uniform distribution on $I$. But your strategy is to sample from the uniform distribution on $D$, and then map each sample to $I$! That's not the same.

Instead of taking the average of all the results, you need to take a weighted average, such that the weight compensates for the number of elements in $D$ that map to the same element of $I$. Hint, it's something like this:

weight = 1
for cell_population in cells:
  weight *= math.factorial(cell_population)

At least, that gets the right answer. Rigorously justifying that formula as a consequence of the mapping between $D$ and $I$ is left as an exercise to the reader.

2
On

The original problem is posed, so far as I can tell, to show the difference between combinations and permutations. In nature, there is no such thing as indistinguishable balls. Semi-infinite tests (e.g. Las Vegas) have shown this to be true.

Now, if the problem really wants you to use "indistinguishable" balls for the purposes of solving the problem, then yes, you need to use combinations and not permutations when calculating all the ways the indistinguishable balls are placed into the containers. And of course, you need to use permutations for the numbered balls, as they are distinguishable from each other and from the collection of indistinguishable balls.

Now, I believe that Chris Culter's calculations reflect this difference. Whether your Python code does this correctly we can't say until we see the code.