Intuition behind using linearity of expectation in the hypergeometric case

81 Views Asked by At

The formula for the expected value in a hypergeometric distribution is: $$ E(X) = {ns\over N} $$ where

  • $n$ is the number of samples,
  • $s$ is the number of successes and
  • $N$ is the population size.

The way we are supposed to see $X$ is: $$ X = I_1 + I_2 + I_3 + \ldots + I_n $$ where each $I_j$ equals $1$ if the $j$-th sample is a success.

Clearly, each $I_j$ is dependent on the previous trials. However, using linearity of expectation, we are supposed to believe that we only need the unconditional probabilities of each trial.

I have two questions:

  1. What does it exactly mean for $I_j$ to be the success of $j$-th sample? Are they talking about first taking n samples, and then checking the jth sample among them (because if so, the entire thing makes sense, refer question $2$)? Or are they talking about the event of sampling jth time after $j-1$ samples have already been taken (if so, taking unconditional probabilities doesn't make sense)?

  2. How do you simply assume the unconditional probability of $I_j$ to be $s/N$, despite the fact that each trial itself has been encoded to be dependent on the previous trials

3

There are 3 best solutions below

0
On BEST ANSWER

When you say that there are $s$ "good" objects in a pool of $N$ objects from which you are going to draw $n$ objects without replacement, the hypergeometric distribution is what we know, before drawing any objects, about the probability that any given number of the "good" objects will end up in the set of objects that we eventually draw. So the hypergeometric distribution is fundamentally all about probabilities that are not conditioned on any partial results of drawing some of the $n$ objects from the pool.

I find it helpful to consider each object in the pool as a unique individual that either will or will not be among the $n$ objects drawn from the pool. As a simple concrete example, put $N = 3$ balls labeled A, B, and C in a bag, mix the contents of the bag thoroughly, and then draw all three balls one at a time. Consider drawing ball A to be a "success" and drawing any other ball to be a "failure," so $s = 1$. We will have exactly one "success" in the three one-ball samples, but it is equally likely to occur on the first, second, or third trial, each with probability $s/N = 1/3$.

For another concrete example, consider a standard deck of $52$ cards. The deck has been thoroughly shuffled and placed on a table with all cards face down. Suppose that drawing a heart is a "success" while drawing any other card is a "failure".

If we just turn over the top card of the deck, it has probability $1/4$ of being a heart (since there are $s = 13$ hearts in a deck of $N = 52$ cards). If we count nine cards off the top of the deck without revealing their faces and then turn over the tenth card, again it has probability $1/4$ of being a heart. Likewise if we turned over the third card instead of the tenth, or the fifth card instead of any of the others.

Those $1/4$ probabilities are all unconditional probabilities that the card at some position in the deck will be a heart.

Now if we take the top $n = 20$ cards from the deck, before we look at any of those cards, each of the $20$ cards has an $s/N = 1/4$ unconditional probability of being a heart. The events of "success" for each card are not independent, of course, but their unconditional probabilities are still all equal to $s/N$.

Note, however, that although these events have equal probability, they are not independent. For example, it is not possible for all $20$ cards to be hearts.

The expectation of a hypergeometric distribution, $E(X)$, is not conditioned on any partial set of observations. It's an unconditional expectation. In order to compute that expected value as a sum of expected values via linearity of expectation, we need all of those expected values to be unconditional.

The part that I think is hardest to understand intuitively is that when we apply linearity of expectation, it does not matter whether the individual "success" events are independent. It only matters that we know the unconditional probability of a "success" on each trial, and therefore the expected value of $I_k$ for each $k$.

So if you understand linearity of expectation, the computation of $E(X)$ is simple. If you do not truly understand linearity of expectation, then you may need to work on that understanding in order to make sense of $E(X)$.

0
On

I think you're right that it's not obvious. I think the following argument works:

Notice that a priori (without knowing anything about the individual draws), the first draw is just as likely to be a success as the last draw (in fact the probabilities are equal, since you only know the total count of successes).

One way to see this is that given $k$ successful draws where $s$ items are good and there are $N$ items total and $n$ draws total, we know nothing about the order of these $k$ successful draws and $n-k$ failed draws. So the a priori distribution over permutations of these draws is uniform. Meaning that each permutation is equally likely.

Since each permutation is equally likely, $\mathbb E I_i = \mathbb E I_j$ for all $i, j$. So we just need to compute $\mathbb E I_1$

0
On

The justification comes from the fact that the distribution of $(I_1, I_2, \dotsc, I_n)$ is invariant under permutation i.e. $$ P(I_1=m_1,\dotsc, I_n = m_n) = P(I_{\pi(1)}=m_1,\dotsc, I_{\pi(n)} = m_n) \tag{0} $$ for any permutation $\pi$ of $\{1,..n\}$ since $$ P(I_1=m_1,\dotsc, I_n = m_n) = \frac{s^{\underline{k}}(N-s)^\underline{n-k}}{N(N-1)\dotsb (N-n+1)} $$ where the underline indicates falling factorial and $k$ is the number of appearances of $1$ or success in $(m_1, \dotsc, m_n)$. From equation $(0)$ it follows that and marginalizing out the other variables that the distribution of the $I_j$ must be equal for $j=1,\dotsc, n$. But $$ P(I_1) = \frac{s}{N} $$ from which the result follows.