Calculating $\mathbb{E}[N]$ for $N = \displaystyle \min_{n\in \mathbb{N}}\Big\{\sum_{i=1}^{n}{X_i\geq5000\Big\}}$, using Wald's lemma

191 Views Asked by At

Suppose I have a sequence of $i.i.d$ random variables: $X_1,X_2,... \sim Geom(p)$. That means that each of the $X_i$'s is holding an unknown random number of trials until a 'success'.

Since $0<p<1$, we know that $X_i$ has a finite expectation $\mathbb{E}[X_i] = \frac{1}{p}$, which also means that there exists an integer $N \in \mathbb{N}$ such that: $$N = \displaystyle \min_{n\in \mathbb{N}}\Big\{X_1+X_2+...+X_n= \sum_{i=1}^{n}{X_i\geq5000}\Big\}$$

We would like to calculate the expectation of this finite integer 'stopping time' $N$. From Wald's lemma:

If $X_i$ are i.i.d. with finite $\mathbb{E}[X_i] = \mu$, and N is a finite stopping time then: $\mathbb{E}\Big[\sum_{i=1}^{N}{X_i}\Big] = \mu\mathbb{E}[N]$.

My problem is how to deal with the 'greater-equal' ($\geq$) sign. Since we define $N = \displaystyle \min_{n\in \mathbb{N}}\Big\{\sum_{i=1}^{n}{X_i\geq5000\Big\}}$, this means that $X_N$ contibutes a number of trials with which the sum exceeds $5000$, but we dont know the exact sum.

My intuition is something like: if we take the the sum as the bare minimum, then $$\mathbb{E}\Big[\sum_{i=1}^{N}{X_i}\Big] = 5000= \mathbb{E}[N]\times \frac{1}{p} \to \mathbb{E}[N] = 5000p$$ But even if that's the case I'm having trouble justifying taking the sum as exactly 5000.

Another possible approach is to condition on $\sum_{i=1}^{N}{X_i}=k$ and take the expectation, but $k = 5000, 5001,...$ and I'm not sure how to formulate this, since $k$ is potentially unbounded ($k\in [5000,\infty)$), if that's even a valid approach.

I'd love some guidance please.

3

There are 3 best solutions below

6
On BEST ANSWER

I reached the same conclusion as Henry (+1) in a more roundabout manner. What follows is the gist of my thoughts:

Recall that a geometric random variable models the number of independent Bernoulli trials until a success occurs. Hence to every realization of a geometric random variable we can canonically associate a realization of a string of Bernoulli random variables. For example, let $X_1,\dots, X_{20}$ be a sequence of geometric random variables and $\omega$ be such that the $(X_1(\omega),\dots,X_{20}(\omega))$ is equal to $(6, 3, 2, 12, 1, 17, 1, 1, 1, 1, 2, 2, 3, 6, 1, 2, 1, 4, 5, 3)$. To this realization we can associate the following realization of a string of Bernoulli random variables:

$00000100101000000000001100000000000000001111101010010000011011000100001001$

Where $1$ denotes success. Note that the number of $1$'s in this string is equal to the number of geometric random variables. We are interested in the minimal number of geometric random variables (the minimal number of $1$'s) such that the string up to and including the last $1$ has length $\geq 5000$. Consider the first $5000$ entries in the string. Clearly, the number of $1$'s in the first $5000$ entries (or trials) can be written as a binomial random variable.

Let $Y$ denote the number of $1$'s in the first $5000$ entries of the random string that is obtained from the geometric random variables as described above. $Y$ is a random variable and we have $Y\sim \text{Bin}(5000,p)$. Denote by $Y_{5000}$ the outcome of the $5000$'th Bernoulli trial.

Claim. We have the following equality

$$N=Y \mathbf{1}({Y_{5000}=1})+(Y+1) \mathbf{1}(Y_{5000}=0), \qquad (1)$$

where $\mathbf{1}()$ is an indicator function. In particular, $$\mathsf E(N) = 4999p+1.$$

Proof. The reasoning is as follows. Consider the first $5000$ trials. Either we have a success on the $5000$'th trial, in which case we have that exactly $Y$ geometric RV have been added up to reach $5000$, and hence $N=Y$; or we do not have a success on the 5000'th trial. In this case we have to wait until the next success occurs for the current geometric RV, which will add $1$ to the tally of $Y$. So in this case we must have $N=Y+1$.

Rewriting $(1)$ as

$$N=Y+\mathbf{1}(Y_{5000}=0).$$

And taking expectations gives the result.
$\square$

2
On

Geometric distributions have the memorylessness property, which makes this much easier if you regard it as a sequence of attempts stopping with the first success at $5000$ or more attempts.

You can calculate the expected value of the number of attempts when you stop: once you have reached $4999$ attempts (it does not matter how many of these were successes or not), you expect $\frac1p$ more attempts until you stop, making the expected value of the sum $$\mathbb{E}\left[\sum\limits_{i=1}^{N}{X_i}\right]=4999+\frac1p$$

This makes the expected number of successes when you stop $$\mathbb{E}[N]=\dfrac{4999+\frac1p }{\frac1p} = 4999p+1$$

0
On

We can also calculate $\mathsf E(N)$ directly. We let $\mathbb N=\mathbb Z_{\ge 1}$ and assume that $(X_n)_{n\in\mathbb N}$ is a sequence of iid random variables such that $\mathsf P(X_1 = m) = p (1-p)^{m-1}$ for some $p\in[0,1]$ and all $m\in\mathbb N$.

For $x\in\mathbb N$, let

$$N_x=\min\left\{n\in\mathbb N:\sum_{i=1}^{n}{X_i\geq x}\right\}.$$

Then

\begin{equation*}\begin{split} \mathsf E(N_x) &=\sum_{r=1}^\infty \mathsf P(N_x\ge r) \\ &=1+\sum_{r=2}^\infty \mathsf P\left(\sum_{m=1}^{r-1}X_m<x\right)\\ &= 1+\sum_{r=2}^\infty\sum_{l=r-1}^{x-1} \binom{l-1}{r-2} p^{r-1} (1-p)^{l-r+1} \\ &=1+\sum_{l=1}^{x-1}\sum_{r=2}^{l+1} \binom{l-1}{r-2} p^{r-1} (1-p)^{l-r+1} \\ &=1+\sum_{l=1}^{x-1}p\sum_{r=0}^{l-1}\binom{l-1}{r} p^r (1-p)^{l-1-r} \\ &=1+\sum_{l=1}^{x-1} p (p+(1-p))^{l-1} \\ &= 1 + p(x-1). \end{split}\end{equation*}


Comments on equalities:

  1. Layer cake representation;
  2. We have $N_x\ge r$ iff $\sum_{m=1}^{r-1} X_m<x$;
  3. See How to compute the sum of random variables of geometric distribution;
  4. Re-arranging terms (can be justified with Fubini+counting measure but I expect there to also be an easier justification that I am too lazy to come up with);
  5. Changing index $r-2\to r$;
  6. Binomial Theorem;
  7. Direct simplification.

Setting $x=5000$ gives the special case $\mathsf E(N)=4999 p +1$.