Sampling distributions - why do I multiply by combination / permutation?

202 Views Asked by At

Say I have a bag with a very large number of disks with the numbers $1$ and $2$ on them. Say the disks are present in the proportion $0.25$ to $0.75$. The number of disks is so large, removing $3$ disks has no impact on the proportion of each type of disk in the bag.

I remove three disks from the bag and I am interested in the sampling distribution of various statistics such as the median. For example, disks $112$ have a median of $1$.

Now when I look at the solution for questions of this type I have to multiply the combine probabilities by the permutation of the disks as if they were selected in order. In the example drawing $112$ the “probability component” would be $(0.25)^2$ $(0.75)^1$. However, in the solutions provided I need to multiply this number by $3$ because of the three different ways of ordering the numbers on the disks: $112, 121$, and $211$.

What I don’t understand is why this multiplication by the number of possible ways of the position of the $2$, i.e. $\binom31$, is necessary. If I put my hand in the bag and select $3$ disks in one go, there is no sense in which there is a “first” disk or a “second” disk or a “third” disk, so why do I multiply by the permutation of the three objects?

Update following comments from Siong and Dave K

Ok. I think I understand. Consider P(A) = $0.25$, P(B) = $0.75$, A and B are independent events. By my model if three events could occur simultaneously the “3-events” would be: AAA, AAB, ABB, BBB where order would not be important so for example ABB = BAB = BBA etc. Hence, because A and B are independent $P(AAA) = 0.25^3, P(AAB) =(0.25)^2(0.75), P(ABB) = (0.25)(0.75)^2, P(BBB) =0.75^3$.

By my model, since the “3-events” are mutually exclusive and should be exhaustive, P(AAA) + P(AAB) + P(ABB) + P(BBB) = $\frac{1+3+9+27}{64} = \frac{40}{64}$ which is nonsense since the sum of all m.e. probabilities should be 1.

So your model is correct i.e. sum is $1(0.25^3) + 3(0.25)^2(0.75)+ 3(0.25)(0.75)^2 + 3(0.75^3) = 1$. Now I have to go away, and think is it possible to have a model of independent simultaneous events, ignoring any considerations of physical reality?

2

There are 2 best solutions below

7
On BEST ANSWER

$(0.25)^2(0.75)$ is the probability that you draw $(1,1,2)$ sequentially (i.e. when order matter)

It is also the probability that you draw $(1,2,1)$.

It is also the probability that you draw $(2,1,1)$.

However, if we do not care about the order, any of the above sequence is good. That is we want to conpute the probability that you get one of $(1,1,2), (1,2,1)$ or $(2,1,1)$, and hence we add them up.

If you care about the order, the following table is of interest to you: \begin{array}{|c|c|} \hline \text{outcome} & \text{probability} \\ \hline (1,1,1) & (0.25)^3 \\ \hline \color{blue}{(1,1,2)} & \color{blue}{(0.25)^2(0.75)} \\ \hline \color{blue}{(1,2,1)} & \color{blue}{(0.25)^2(0.75)} \\ \hline \color{blue}{(2,1,1)} & \color{blue}{(0.25)^2(0.75)}\\ \hline (1,2,2) & (0.75)^2(0.25)\\ \hline (2,2,1) & (0.75)^2(0.25)\\ \hline (2,1,2) & (0.75)^2(0.25) \\ \hline (2,2,2) & (0.75)^3 \\ \hline \end{array}

If not, the following table which summarizes number of $2$ matters for you.

\begin{array}{|c|c|} \hline \text{number of } 2 & \text{probability} \\ \hline 0 & (0.25)^3 \\ \hline \color{blue}{1} & \color{blue}{3(0.25)^2(0.75)} \\ \hline 2 & 3(0.75)^2(0.25)\\ \hline 3 & (0.75)^3 \\ \hline \end{array}

See how the second table can be constructed from the first table by grouping rows together. Also notice that probabilty sums to $1$ in both table.

5
On

When you say there is no sense in which any disk is "first", you have merely decided to ignore any sense in which any disk is first.

In an actual physical implementation of this experiment, you reach into a bag and touch disks with your fingers in order to pick them up and pull them out. Without looking, can you guide your finger so accurately that it touches three disks simultaneously, none of them even one microsecond earlier than the others? And then guide a second finger to grasp all three disks simultaneously from the other side so you can pick them all up at once?

Perhaps if there happens to be a stack of three disks near the mouth of the bag, we might charitably say you were lucky enough to grab that stack and that your fingers made contact with the edges of all three disks simultaneously. If the stack consists of one "2" and two "1"s, however, the "2" has to be somewhere in the stack--top, middle, or bottom. That's three different ways to pick up the chips.

The fact that you may prefer to ignore these distinctions does not oblige everyone else to ignore them. And when watching someone draw three disks from a bag, where there is enough randomness and symmetry that the usual simplistic probability models work reasonably well, two different mathematicians should not get two radically different probability results for the same event. If they do, repeating the experiment multiple times will show soon enough that the real world disagrees with one of the mathematicians.

If the chips were all in some sort of macroscopic quantum superposition such that every single chip "occupied" the exact same location in space as every other chip, you might be able to select three chips in such a way that there is actually no "order" in the selection at all. In that case I'm not sure the usual balls-in-an-urn model applies. But we tend to prefer models that are closer to something we can easily physically realize.

This exercise is easy enough to approximate in real life. You could take a pile of chips of one color and three times as many chips of another color, mix them up in a big bag, pull out three chips, and then write a tick mark next to the word "yes" if you get two of color number 1 and one of color number 2, "no" in any other case. Repeat the experiment a few dozen times and count how many tick marks you have written next to each word. Compare the ratio to what you would expect by computing the probabilities without considering the three chips in sequence and what you would expect if you do consider them in sequence. Which calculation was closer to the actual observation?