Basic Monte Carlo simulation of indistinguishable balls and distinguishable bins

622 Views Asked by At

I'm having fun and learning about Monte Carlo methods in probability. During a thought experiment, I noticed that I don't know how to simulate one of the most basic questions in probability --- balls and bins.

Let me clarify.

Suppose we have 2 bins, $A$ and $B$, and two balls. I might ask what is the probability that bin $A$ has at least one ball? If the two balls are indisinguishable, then the three possibilities are $$\begin{array}{l|l} A&B \\ \hline \cdot \cdot & \\ \hline \cdot & \cdot \\ \hline & \cdot \cdot \end{array}$$ so that in $2/3$ of cases, $A$ has at least one ball. If the two balls are distinguishable (and maybe we call them ball $1$ and ball $2$, then there are four possibilities, $$\begin{array}{l|l} A&B \\ \hline 1, 2 & \\ \hline 1 & 2 \\ \hline 2 & 1 \\ \hline & 1,2 \end{array}$$ and in $3/4$ of cases, $A$ has at least one ball.

Now suppose that we wanted to simulate the bins-in-balls problem. For distinguishable balls in distinguishable bins, this is very clear. We could think of holding up each ball and putting it in each bin with equal random probability. This could lead to a simulation looking like the following pseudocode,

numtrials = 1000
successes = 0
foreach trial in numtrials:
    balls_in_A = 0
    foreach ball in [1,2]:
        if random < 0.5:
            balls_in_A += 1
    if balls_in_A > 0:
        successes += 1
print successes/numtrials.

This is pretty straightforward (and admittedly very simple). Now suppose I wanted to simulate the indistinguishable scenario. It's not clear to me how to do that.

I suppose one way would be to actually simulate a two stars and one bar type situation, but I'm thinking of this example as the simplest possible case, perhaps as a stand-in for a harder problem. [For instance, perhaps I would later be throwing indistinguishable darts at a dartboard and wanted to know how often some region had at least $k$ darts, or higher dimensional things of that flavor].

1

There are 1 best solutions below

2
On BEST ANSWER

Probabilities are dependent on what things are considered to be equally likely in the sample space. In your example that gives a result of $2/3$, the sample space is distinguishable configurations of two identical balls in two different bins. There are three such configurations, and if each is equally likely, the probability of there being at least one ball in the first bin is $2/3$.

If two indistinguishable balls are independently placed into two distinguishable bins with equal likelihood of going into each bin, however, the probability that the first bin ends up with at least one ball is not $2/3$. It’s $3/4$.

In the second example, the sample space is distinguishable configurations of two distinguishable balls in two different bins. There are four such configurations, and if each is equally likely, the probability of there being at least one ball in the first bin is $3/4$, whether the balls’ labels matter or not.

You can’t simulate the “indistinguishable balls in distinguishable bins problem” you described first by generating sequences of place-a-ball-into-a-bin events, because such sequences won’t generate the three possible configurations with equal likelihood. [The “classic” problem is usually a counting problem, not a probability problem.]

You later mention you might want to simulate “throwing indistinguishable darts at a dartboard” to estimate “how often some region had at least $k$ darts.” That isn’t a generalization of the first question (which is about fractions of distinguishable configurations, not chances that certain things happen after a sequence of placement events); you can, however, simulate it by performing a simulation like your “$3/4$” one and then ignoring the darts’/balls’ labels.

I realize this isn’t an answer to “How do I simulate this question?”, but maybe it will help you realize that you may be trying to simulate something that isn’t amenable to a simulation because it’s not really a probability question.