Expectation of maximum from $n$ draws.

359 Views Asked by At

We have $n$ guys walking down the street, and each can find $1, 2, \ldots $ or $n$ dollars in the street. ($n$ is the same number of guys and the same number of dollars in the problem).

Each of them finds $1$ dollar with probability $\dfrac{1}{2}$, $2$ dollars with probability $\dfrac{1}{4}$, $3$ dollars with probability $\dfrac{1}{8}$, and so on. This is, he finds $x$ number of dollars with probability $\dfrac{1}{2^x}$. (These probabilities are independent between guys, think that they are walking down different streets; the remaining $1/2^n$ probability can be arbitrarily assigned to getting $n$ (or 0) dollars, whatever is more convenient).

In expectation, every agent finds less than $2$ dollars. But: what is the expected number of dollars that the luckiest guy gets? This is, if $x_1, \ldots, x_n$ is the number of dollars that people find, what is $E[\max(x_1, \ldots, x_n)]$?

Edit: I am interested in obtaining a simple expression for the expectation OR a good upper bound.

Idea: Simulations suggest that the answer is around $\log(n)$.

Edit: I tried solving it using Lulu's suggestion. Let $M=\max(x_1, \ldots, x_n)]$. Then

$E[M]=\sum_{i=1}^n Pr(M=i)\cdot i$.

$Pr(M=1)=\frac{1}{2^n}$.

$Pr(M=2)=\frac{3^n}{4^n}-\frac{1}{2^n}$.

$Pr(M=3)=\frac{7^n}{8^n}-\frac{3^n}{4^n}+\frac{1}{2^n}$.

$Pr(M=4)=\frac{15^n}{16^n}-\frac{7^n}{8^n}+\frac{3^n}{4^n}-\frac{1}{2^n}$.

This implies that $E[M]=1 \cdot \frac{1}{2^n} + 2 \cdot (\frac{3^n}{4^n}-\frac{1}{2^n}) + 3 \cdot (\frac{7^n}{8^n}-\frac{3^n}{4^n}+\frac{1}{2^n}) + 4 \cdot (\frac{15^n}{16^n}-\frac{7^n}{8^n}+\frac{3^n}{4^n}-\frac{1}{2^n}) + \ldots$.

But I do not see how that expression converges to something easy to work with - or close to $\log (n)$.

Edit 2: A different approach, using $E[M]= \sum_{k=0}^\infty Pr(M\geq k)$, gives

$E[M]=\sum_{k=0}^\infty 1- \left( \frac{2^{k-1}-1}{2^{k-1}}\right)^n$, but I am not sure how to expand this term to make it approximately equal to $\log(n)$.

2

There are 2 best solutions below

11
On

I find it easier to see this problem as "$1+$ the expected value of the maximum of consecutive heads, for $n$ series of flips. When I say "consecutive heads", that means we stop at the first tail we have and write down the number of heads we had. The $1+$ is here because the distribution would be shifted by 1 (you get 0 head with probability 1/2, 1 with probability 1/4, 2 with 1/8, etc.).

We end up on a longest success run length problem. We will denote by $L_n$ the longest run random variable, $L_n = \max(x_1,\dots,x_n)$. Every $x_i$ run follows a geometric distribution $\mathbb{P}(x_i=k) = pq^{k} = 1/2^{k+1}$ with $p=q=1/2$ here and $k \in \mathbb{N}$.

Based on that, we may argue that a qualitative answer for the expected value of the long run is an $l$ such that $np^lq = 1$ (the longest run is expected to occur at least once). Solving for l, we have $$l = -\dfrac{\log n(1-p)}{\log p} = \dfrac{\log n/2}{\log 2}.$$

This article establishes that the spread between this result and the real $\mathbb{E}[L_n]$ follows a Gumbel distribution with $\mu=0$ and $\beta = 1/\log(1/p)$ and thus allows to draw confidence intervals. No closed-form though.

Do not forget to add $1$ to get back to the initial problem. That gives $\mathbb{E}[L_n] \approx \log_2(n)$.

This problem was already tackled here.

8
On

Each of the $X_i$ (amount of dollars) has a truncated geometric distribution. For those the same result holds, as for the usual geometric distribution, namely that $\min(X_i, X_j)\sim Geo(1-(1-p)^2)$ and in general for $k$ of those variables, we get $$\min(X_{i_1},X_{i_2},\ldots,X_{i_k})\sim Geo(1-(1-p)^k)$$

Now we can use the mini-max-Formula for expectations (yet another inclusion exclusion principle): $$\mathbb E(\max(X_1,\ldots,X_n))=n\cdot\mathbb E[W_1]+\sum_{m=2}^n(-1)^{m+1}\binom nm\cdot \mathbb E[ \min(W_1,\ldots,W_m)] \\=\sum_{k=1}^n\binom nk \frac{(-1)^{k+1}}{1-(1-p)^k}=\sum_{k=1}^n\binom nk \frac{(-1)^{k+1}}{1-1/2^k}$$

Edit: The extreme value theorem by Fisher–Tippett–Gnedenko indeed implies that $T_n:=\max(W_1,\ldots,W_n)$ converges in distribution (also in probability) to a standard-Gumbel Distribution, i.e.

$$\lim_{n\rightarrow \infty}\mathbb P\left((1-p)\cdot\left(T_n+\frac{\ln(n)}{\ln(1-p)}\right)\leq x\right)=F_G(x)$$ where $G\sim Gumbel(0,1)$.

So indeed, $$\mathbb E[T_n]\approx \frac{\gamma_E}{1-p}-\frac{\ln(n)}{\ln(1-p)}$$ for large $n$. The convergence is however not so fast, so $n$ needs to be quite large. I could only numerically test in R, where the exact value went crazy for $n>50$.

You find the theorem on Wikipedia here: https://en.wikipedia.org/wiki/Fisher%E2%80%93Tippett%E2%80%93Gnedenko_theorem,

The second case is the one applicable here. One "only" needs to check all the assumptions.

It is possible though, that I messed up the computation of $a_n=1-p$. Also, $ln(n+1)$ makes the approximation a little better, but it makes no difference for the convergence naturally.