Showing that any of n balls drawn without replacement has the same probability of being a particular colour

158 Views Asked by At

Suppose that $X\sim \text{HGeom}(w,b,n)$ represents the distribution of $w$ white and $b$ blue balls (where $w+b=n$) in an urn. Let $X_j$ represent the indicator random variable of the $j$-th ball being white if they are drawn without replacement. My question is how you can show that $E(X_j) = w/(w+b)$ for any $j$ by symmetry.

Clearly,

$$\begin{align} E(X_1) &=\frac{w}{w+b}\\[3ex] E(X_2) &= \frac{w}{w+b}\left(\frac{w-1}{w+b-1}\right) +\frac{b}{w+b} \left(\frac{w}{w+b-1}\right) =\frac{w}{w+b} \end{align} $$

Does this pattern continue for all $j$ up to $n$? Could it be extended if there were balls of $k$ different colours?

2

There are 2 best solutions below

0
On BEST ANSWER

I believe there may be a number of elegant proofs, perhaps relaying on the linearity of expectation of random variables and indicator variables (expectation = probability). Joseph K. Blitzstein has a similar problem explain here, which would be paraphrased as follows with regard to the symmetry insight:

This is true by symmetry. The first ball is equally likely to be any of the $b + w$ balls, so the probability of it being white is $\frac{w}{w +b}.$ But the second ball is also equally likely to be any of the $b + w$ balls (there aren’t certain balls that enjoy being chosen second and others that have an aversion to being chosen second); once we know whether the first ball is $W$ we have information that affects our uncertainty about the second ball, but before we have this information, the second ball is equally likely to be any of the balls. Alternatively, intuitively it shouldn’t matter if we pick one ball at a time, or take one ball with the left hand and one with the right hand at the same time. By symmetry, the probabilities for the ball drawn with the left hand should be the same as those for the ball drawn with the right hand.

If every possible result is a single-cycle permutation of $w$ and $b$ balls that can be considered otherwise distinguishable by their order of extraction, but that for every single result each ball could have equally have been extracted in one position posterior to the position it occupies, the actual result can be viewed as that sliding of one position forward of each one of the balls with a period of $w+b,$ so that every single different result is matched by $w+b$ results where the relative position doesn't change, and each ball occupies all possible extraction points.


In your calculation you get to $E(X_1)=\frac{w}{w+b}$ and $E(X_2)=\frac{w}{w+b}\left(\large \Box \right),$ where happily, $\large \Box =1,$ and hence, $E(X_1)=E(X_2).$ So what we want is that this patterns holds for all $X_i,$ such that $E(X_i)=\frac{w}{w+b}\left(\color{red}{\large \Box} \right)$ with $\color{red}{\large \Box}=1$ for all $i$'s.

And this pattern can possibly be teased out by just seeing what happens next -in the case of $X_3:$

$$ E(X_3) =\Tiny \left(\frac{w}{w+b}\right) \left(\frac{w-1}{w+b-1}\right)\left(\frac{w-2}{w+b-2}\right) +2\left(\frac{b}{w+b}\right) \left(\frac{w}{w+b-1}\right)\left(\frac{w-1}{w+b-2}\right) + \left(\frac{b}{w+b}\right) \left(\frac{b-1}{w+b-1}\right)\left(\frac{w}{w+b-2}\right)$$

Clearly we'll always be able to extract the $E(X_1)=\frac{w}{w+b}$ as a factor in front of the sum since each $w$ and $w+b$ appear in each term in the numerator and denominator, respectively. What remains to be proven is that the sum multiplied by $E(X_1)$ is always equal to $1:$

$$\begin{align} 1=\Tiny{ \left(\frac{w-1}{w+b-1}\right)\left(\frac{w-2}{w+b-2}\right) +2\left(\frac{b}{1}\right) \left(\frac{1}{w+b-1}\right)\left(\frac{w-1}{w+b-2}\right) + \left(\frac{b}{1}\right) \left(\frac{b-1}{w+b-1}\right)\left(\frac{1}{w+b-2}\right)}\implies \\[3ex] \small{(w+b-1)(w+b-2)=(w-1)\;(w-2) + 2\;b\;(w-1) + b\;(b-1)\\ ={2 \choose 0}(w-1)\;(w-2) +{2 \choose 1} b\; (w-1) + {2\choose 2} b\;(b-1)} \end{align}$$

But the LHS is the 2-permutations of $(w - 1) + b$, while the RHS is the binomial expansion considering $w$ and $b$ to denote the number of elements in the set of class $\text W$hite and $\text{B}$lack, respectively.

This pattern will hold for any $X_i,$

$$\begin{align} \left((w-1) + b\right)\left((w-2) + b\right)\cdots\left((w-i-1) +b\right)&\\[3ex] =\small{ {i-1 \choose 0} (w-1)(w-2)\cdots (w-i-1)\\+\cdots + {i-1 \choose j} (w-1)\cdots (w-i-j)\,b\,(b-1)\cdots(b-j)\\+\cdots+{i-1 \choose i-1} b\,(b-1)\cdots (b-i-1)} \end{align}$$

0
On

There are several perspective to recognize the fact without the tedious calculation. Here is some of my understanding can be shared with you.

Note that the usually "fallacy" that people have is that people think that when $j > 1$, they already obtained the information from $X_1, X_2, \ldots, X_{j-1}$ so the probability is changed from $w / (w + b)$. Although $X_j$ are dependent, now the question is asking about the marginal distributionof $X_j$, but not the conditional distribution of $X_j \mid X_1, X_2, \ldots, X_{j-1}$. The latter conditional probability is similar to the experiment that we drawn the first $j-1$ balls and put on the table, observing their color, such that the probability of the next drawn will be dependent on the color of the balls we already drawn. However, in our case the marginal distribution is like we do not observe the color of the ball of the first $j-1$ balls, just put them into another bag immediately after drawn. So actually the marginal distribution of each ball is the same, assume they are all equally-likely to be drawn.

Actually we assign the order of the ball is like we assign a permutation. Imagine we assign the $n$ balls the number $1, 2, \ldots, n$. Assume tt is equally-likely for each permutation, and thus the probability of the $i$-th ball is put in the $j$-th position is equal to the number of favorable permutations divided by number of total permutations. Since for each favorable permutation, we can shift the balls such that the $i$-th ball is put in the other $k$-th position. (E.g. the original permutation is $(1, 3, 2, 4)$, we can shift it to $(4, 1, 3, 2), (2, 4, 1, 3), (3, 2, 4, 1)$. So by grouping the permutation in this way, we can claim that it is equally-likely for the $i$-th ball to be position in any one of the position. Now you paint the first $w$ ball to be white, and you can sum the probabilities to be $w/n$