Hypergeometric distribution question solving

262 Views Asked by At

Consider $N$ items, and $N_1$ "hot items". We then select items, without replacement, until we got $n$ "hot items" ( $1\le n \le N_1 $, $n$ being a constant). Let $X$ be a random variable which represents the number of trials needed( to get the $n$ "hot items"). I need to calculate the Mean and the Variance of the variable.

So far i got this:

$$P[X=k] =\frac{ {N_1 \choose n}{N-N_1 \choose k-n} }{N \choose k}$$ So, the minimum extractions would be $n$ ( when i select all of the "hot items") and the maximum extractions would be $N-N_1+n$ ( i select all of the "non-hot-items" and then the $n$ "hot-items"). Kind of a hypergeometric distribution. I said kind of because it's not exactly the hypergeometric distribution. Random variable X represents the number of getting n "hot items" where n is constant, not k= 1,2... But when i need to calculate the Mean of $X$, i got:

$$E[X]=\sum_{k=n}^{N-N_1+n} {k\frac{ {N_1 \choose n}{N-N_1 \choose k-n} }{N \choose k}}$$

where I kinda get lost to it. Any idea how to solve this or i'm wrong on this?

2

There are 2 best solutions below

0
On

The following is not a full solution but is too long for a comment.

First I don't think that the pmf for $X$ is correct. Note that we keep drawing until we get $n$ hot items. In particular, $X$ is the minimum number of draws until we get $n$ successes (successes being hot items). Your pmf accounts for $n$ successes in $k$ trials but does not require a success occur on the last trial. In particular the pmf for $X$ should be $$P(X=k)= \frac{\binom{N_1}{n-1}\binom{N-N_1}{k-n}}{\binom{N}{k-1}}\times \frac{N_1-n+1}{N-k+1} $$ The first term corresponding to $n-1$ successes in the first $k-1$ draws and then success on the final draw. Up to a different parametrization I think that $X$ follows a negative hypergeometric distribution.

0
On

Random variable $X$ in your question does not have hypergeometric distribution. When calculating $\mathbb P(X=k)$ for $k\geq n$, you need that "hot item" appears in last trial number $k$, and in the prevoius $k-1$ trials we get $n-1$ "hot items". So $$ \mathbb P(X=k)=\frac{\binom{N_1}{n-1}\binom{N-N_1}{k-n}}{\binom{N}{k-1}}\cdot \frac{N_1-n+1}{N-k+1}. $$ With the properties of binomial coefficients, it can be rewrited as $$ \mathbb P(X=k)=\frac{\binom{N_1}{n}\binom{N-N_1}{k-n}}{\binom{N}{k}}\cdot \frac{n}{k}. $$ If you consider $Y=X-n$ as the number of "non-hot-items" before you get $n$ "hot-items", it has negative hypergeometric distribution

Its expectation and variance are calculated in Wikipedia.

$$\mathbb E[Y] = \frac{n(N-N_1)}{N_1+1}, \quad \mathbb E[X]=\mathbb E[Y]+n$$ and $$\text{Var}(X)=\text{Var}(Y) = \frac{n(N-N_1)(N+1)(N_1-n+1)}{(N_1+1)^2(N_1+2)}.$$

It is alternative way to find expectation and variance. Recall that we have $N_1$ white balls and $N-N_1$ black balls. We take balls randomly until we get $n$ white balls. Let $Y$ be the total number of black balls drawn.

Let all black balls are numbered by $i=1,\ldots,N-N_1$. Introduce indicator random values $Z_i$: $Z_i=1$, if $i$th black ball appears earlier than $n$ white balls are drawn. Else $Z_i=0$.

Find $\mathbb P(Z_i=1)$. Note that we look only on $N_1+1$ balls: $i$th black ball and all white balls. They can be arranged in arbitrary order, and we are interested in such arrangements when the black ball takes one of the first $n$ places $$ \underbrace{BWWW\ldots W}_{n+1}\ldots W,\quad \underbrace{WBW\ldots W}_{n+1}\ldots W, \quad \ldots,\quad \underbrace{ WW\ldots WBW}_{n+1}\ldots W $$ So $$ \mathbb P(Z_i=1) = \frac{n}{N_1+1} = \mathbb E[Z_i]. $$ Since $Y=Z_1+\ldots+Z_{N-N_1}$, $$ \mathbb E[Y] = \sum_{i=1}^{N-N_1} \mathbb E[Z_i] = (N-N_1)\frac{n}{N_1+1}. $$

In order to calculate variance we need to note that $Z_i$ are dependent variables. So $$ \text{Var}(Y)= \sum_{i=1}^{N-N_1} \text{Var}(Z_i) + 2 \sum_{i<j} \text{Cov}(Z_i,Z_j) = (N-N_1)\text{Var}(Z_1) + (N-N_1)(N-N_1-1)\text{Cov}(Z_1,Z_2) \tag{1} $$ since all pairwise covariances are the same.

We need $\mathbb P(Z_1=1, Z_2=1)=\frac{(n+1)n}{(N_1+2)(N_1+1)}$. Indeed, 1st and 2nd black balls can take any two places among $N_1+2$ places by $(N_1+2)(N_1+1)$ ways. And there are $(n+1)n$ ways for this black balls to take places before $n$th white ball.

Then $$ \text{Cov}(Z_1,Z_2) = \mathbb E[Z_1Z_2] - \mathbb E[Z_1]\mathbb E[Z_2] = \frac{(n+1)n}{(N_1+2)(N_1+1)} - \frac{n^2}{(N_1+1)^2} $$ $$ =\frac{n(N_1-n+1)}{(N_1+1)^2(N_1+2)}. $$ $$ \text{Var}(Z_1) = \frac{n}{N_1+1} - \frac{n^2}{(N_1+1)^2}=\frac{n(N_1-n+1)}{(N_1+1)^2}. $$ Substitute these values into (1): $$ \text{Var}(Y) = (N-N_1)\left[\frac{n(N_1-n+1)}{(N_1+1)^2}+(N-N_1-1)\frac{n(N_1-n+1)}{(N_1+1)^2(N_1+2)}\right] $$ $$ =\frac{(N-N_1)n(N_1-n+1)}{(N_1+1)^2(N_1+2)}\bigl[(N_1+2)+(N-N_1-1)\bigr] = \frac{(N-N_1)n(N_1-n+1)(N+1)}{(N_1+1)^2(N_1+2)}. $$

This is exactly the same as in Wiki.