Determine the expected value of a geometric distribution given some generic underlying distribution.

167 Views Asked by At

This is a variation of the standard waiting time problem. Suppose you have a sequence of variables

$$X_0,X_1,X_2,\ldots \overset{iid}{\sim} F(x)$$

where $F(x)$ is continuous.

And random variable $N$ where $N$ is the first value $n$ for which $X_n > X_0$. I would like to find $\Pr[N=n]$ and $E[N]$. So I did

$$\Pr[N=n] = \Pr[X_1,\ldots,X_{n-1} \le X_0] = \Pr[X_1 \le X_0] \cdots \Pr[X_{n-1} \le X_0] = F(X_0)^{n-1}.$$

And the expectation is: $$E[N] = \sum_{n \ge 1} n F(X_0)^{n-1} = \frac{1}{(F(X_0)-1)^2}.$$

But the expected value I computed does not need to be an integer. So I feel like something went wrong, what gives?

2

There are 2 best solutions below

6
On BEST ANSWER

The following, quoted from the question, is wrong: $$ \Pr[N=n] = \Pr[X_1,\ldots,X_{n-1} \le X_0] = \Pr[X_1 \le X_0] \cdots \Pr[X_{n-1} \le X_0] = F(X_0)^{n-1}. $$ This would be correct if it said $$ \Pr[N=n\mid X_0] = \Pr[X_1,\ldots,X_{n-1} \le X_0\mid X_0] = \Pr[X_1 \le X_0\mid X_0] \cdots \Pr[X_{n-1} \le X_0\mid X_0] = G(X_0)^{n-1} $$ where $G$ is the conditional c.d.f. given $X_0$.

If you want $\Pr(N=n)$ and you've got $\Pr(N=n\mid X_0)$, then that is some function of $X_0$, and is thus a random variable. The expected value of that random variable is $\Pr(N=n)$.

The answer posted by karakusc says correctly that $\Pr(X_n>X_0)=1/2$, but then goes on to say that $\Pr(X_1>X_0\ \&\ \cdots\ \&\ X_n>X_0)= (1/2)^n$. This mistakenly assumes the events $X_k>X_0$ for $k=1,\ldots,n$ are independent. They are not independent since they all depend on $X_0$. It would be correct to say that $\Pr(X_1>X_0\ \&\ \cdots\ \&\ X_n>X_0\mid X_0)= \left( \Pr(X_1\mid X_0) \right)^n$, since the events involved are conditionally independent given the value of $X_0$. But the value of $\Pr(X_k>X_0\mid X_0)$ depends on $X_0$ in a way that itself depends on what the distribution of $X_0$ is.

PS: I have now begun to think that I missed an opportunity to progress further on this than I did in the answer above. Stand by$\text{ }\ldots$

PPS: $\ldots$ and now I wonder why no one else pointed that out.

Very well, then, $X_0,X_1,\ldots,X_n$ are i.i.d. and their c.d.f. $F$ is continuous.

We have $N=n$ precisely if $X_1,\ldots,X_{n-1} \le X_0 < X_n$. That is the probability that a uniformly distributed random permutation of $0,1,\ldots,n$ puts $0$ and $n$ in the last two positions, in that order. And that is $$ \frac{\text{number of permutations of $1,\ldots,n-1$}}{\text{number of permutations of $0,1,\ldots,n$}} = \frac{(n-1)!}{(n+1)!} = \frac 1 {n(n+1)}. $$

PPPS: $\ldots$ and now it has occured to me that there is a yet simpler way to characterize the distribution of $N$. The probability that $X_0$ is the largest among $X_0,X_1,\ldots,X_n$ is $1/(n+1)$, since all of these random variables are equally likely to be the biggest. Therefore $\Pr(N>n)=\dfrac 1 {n+1}$. And so $$ \Pr(N=n) + \Pr(N>n) = \Pr(N>n-1) = \frac 1 n, $$ hence $$ \Pr(N=n) + \frac 1 {n-1} = \frac 1 n $$ and finally $$ \Pr(N=n) = \frac 1 {n-1} - \frac 1 n. $$

7
On

Since $F(x)$ is continuous, $P(X_n=X_m)=0$ for all $m,n$, since otherwise $F(x)$ would need to have a point mass, which means it cannot be continuous. Then since $\{X_n \}$ is i.i.d., $P(X_n>X_0)=\frac{1}{2}$ for all $n$, by symmetry.

Therefore $N$ is simply a geometric random variable with parameter $\frac{1}{2}$, and $\mathbb{E}[N]=2$.

Note: This result is wrong since the events $\{ X_n>X_0\}$ are not independent, but I would like to keep it since it contains a correct side observation in the first paragraph. See Michael Hardy's post for the correct answer.