Dynamic urn problem

124 Views Asked by At

Consider an urn with $k$ balls, labeled $1$ through $k$. One step shall be defined as the following procedure: randomly drawing one ball at equal probability for all balls in the urn, removing all balls with labels showing numbers greater than the number of the ball drawn, putting the ball drawn back. (Side remark: it is possible that the number of balls doesn't change by doing a step).

Question: What is the expected value for the number of steps $n$ needed to draw the ball with label $1$?

Answer (so far): If we look at the problem via tree diagrams we can conclude, that the cumulant probability to draw the ball labeled $1$ within $m+1$ steps is, starting with $k$ balls is given by $S(m,k)/k$ with: $$S(m,k_0)=\sum_{k_1=1}^{k_0}\cdots\sum_{k_m=1}^{k_{m-1}}(\prod_{j=1}^{m}k_j)^{-1}.$$

Replacing $1/i$ with $\int_0^1dx\;x^{i-1}$ the arising terminating geometric series $\sum_{i=0}^{k-1}x^i$ with $\frac{1-x^k}{1-x}$ and integrating out $\int_0^xdy\;(1-y)^{-1}$ this can be simplified to

$$S(n+1,k)\cdot n!=\int_0^1\frac{1-x^k}{1-x}\ln^n(\frac1{1-x}),$$

where of course the word 'simplified' is up to personal preference. This form only remains valid for nonnegative $n$ and as the number of steps is given by $n+2$ we need to know in addition to the above, that $S(0,k)=1$ (ie. that the probability of finding $1$ in the first step is $1/k$).

From all that we can in principle find the Expactation value by noting, that for discrete random variables with cumulative $F(n)$ the mean can be defined as $E=\sum_{n=0}^\infty \bigl(1-F(n)\bigr)$.

Unfortunately I'm not able enough to calculate this sum. $F$ relates to $S$ by $F(n)=S(n-1,k)/k$.


EDIT: The answer provided by WhatsUp is correct. I was unsatisfied that the formula $E_m = 1+\frac1m\sum_{i=1}^mE_i$ just came up from nowhere and derived it myself.

From tree diagrams one sees that the probability $P_k(n)$ to find $1$ in the $n$-th step with $k$ balls before the first step is given by $1/k$ if $n=1$ and $P_k(n+1)=\frac1k\sum_{k_1=2}^k\cdots\sum_{k_n=2}^{k_{n-1}}\prod_{j=1}^n\frac1{k_j}$ otherwise. This directly gives $P_k(n)=\frac1k\sum_{i=2}^kP_i(n-1)$.

Inserting that into the mean formula $E_k=\sum_{n=1}^\infty n\cdot P_k(n)$ gives $$\frac1k+\frac1k\sum_{n=1}^\infty(n+1)\sum_{i=2}^kP_i(n)\\ =\frac1k + \frac1k\sum_{i=2}^k\sum_{n=1}^\infty n\cdot P_i(n) + \sum_{n=2}^\infty\frac1k\sum_{i=2}^kP_i(n-1).$$ This can be found to be equal to $1+\sum_{i=2}^kE_i/k$ by noting that the last summand has to amount to $1-P_k(1)$.


Of course then $E_k=\frac1{k-1}(k+\sum_{i=2}^kE_i)$ follows like outlined by WhatsUp and $E_k=1+\sum_{i=1}^{k-1}\frac1i$ can be shown by induction.

3

There are 3 best solutions below

6
On BEST ANSWER

For $m\geq 1$, let $E_m$ be the expected number of steps until only one ball remains, if one starts with $m$ balls.

We then have $E_1 = 0$ and $E_m = 1 + \frac 1 m \sum_{i = 1}^m E_i$ for $m > 1$.

This gives $E_m = \frac 1{m - 1}\left(m + \sum_{i = 1}^{m - 1} E_i\right)$ for $m > 1$.

Therefore the sequence $(E_m)_{m\geq 1}$ looks like: $$0, 2, \frac{5}{2}, \frac{17}{6}, \frac{37}{12}, \frac{197}{60}, \frac{69}{20}, \dotsc$$

Guess what it is? It's simply $E_m = 1 + \sum_{i = 1}^{m - 1}\frac 1 i$, for any $m > 1$.


Proof by induction: For $m = 2$, it is clear, as $E_2 = 2 = 1 + \frac 1 1$.

Suppose it's true for $m$. Then we have, for $m + 1$:

\begin{eqnarray*} E_{m + 1} &=& \frac 1 m\left(m + 1 + \sum_{i = 1}^m E_i\right) \\ &=& 1 + \frac 1 m + \frac 1 m \sum_{i = 2}^m\left(1 + \sum_{j = 1}^{i - 1}\frac 1 j\right)\\ &=& 2 + \frac 1 m \sum_{i = 2}^m\sum_{j = 1}^{i - 1}\frac 1 j\\ &=& 2 + \frac 1 m \sum_{j = 1}^{m - 1}\sum_{i = j + 1}^m\frac 1 j\\ &=& 2 + \frac 1 m \sum_{j = 1}^{m - 1}\frac{m - j}{j}\\ &=& 1 + \frac 1 m + \sum_{j = 1}^{m - 1} \frac 1 j\\ &=& 1 + \sum_{j = 1}^m\frac 1 j. \end{eqnarray*}


So the expected number of steps needed to draw the ball with label 1 is exactly $1 + \sum_{i = 1}^{k - 1}\frac 1 i$, when we start with $k$ balls.

For $k > 1$ this is the same as the expected number of steps until only one ball remains, and for $k = 1$ it is also valid, since we need to perform $1$ draw to get that ball.

7
On

In WhatsUp’s nice proof by induction, the harmonic numbers seem to somewhat magically appear out of some algebraic manipulations. We can make some probabilistic sense of this algebra as follows.

The expected number of draws before we draw the ball labeled $1$ is the sum of the expected numbers of draws we will draw each of the other balls. We can prove by induction that the ball labeled $j$ is expected to be drawn $\frac1{j-1}$ times, independent of $k$. This trivially holds for $k=1$ (as in this case there are no other balls). Assume that it holds up to $k-1$. If we have $k$ balls, the ball labeled $j$ will be drawn with probability $\frac1k$. It will still be there as one of $k$ balls with probability $\frac1k$, and it will still be there as one of less than $k$ balls with probability $\frac{k-j}k$. Thus the number $E$ of times that we expect to draw it satisfies

$$ E=\frac1k+\frac1k\cdot E+\frac{k-j}k\cdot\frac1{j-1}\;, $$

with solution $E=\frac1{j-1}$. Summing this over all other balls yields $\sum_{j=2}^k\frac1{j-1}=\sum_{j=1}^{k-1}\frac1j=H_{k-1}$, and then we have to add the $1$ time that we are certain to draw the ball labeled $1$, for a total of $H_{k-1}+1$.

0
On

WhatsUp's and joriki's answers both have in common that they caracterise the random variable in question (number of draws needed to draw $1$) only by giving it's mean. In my previous attempt I tried to give a full characterisation of this random variable and arrived at its cumulative probability function. From there I couldn't go further. Noting that the probability $P_k(n)$ to last $n$ steps at $k$ balls before the first step is given recursively by $P_k(n)=1/k\sum_{i=2}^k P_i(n-1)$ and $P_k(1)=1/k$ opens up the possibility to derive WhatsUp's solution (as noted at the end of the Edit in my post) but also grants to derive the generating functions.


The factorial moment generating function $M(t)$ is the expected value $\sum_{n=0}^\infty t^n P(n)$. The factorial moments can be obtained from it by $$ E[N(N-1)\cdot\ldots\cdot(N-k+1)]=\lim_{t\uparrow1}\frac{d^k}{dt^k}M(t), $$ that is for example $E=M'(1)$ and $Var=M''(1)+M'(1)-M'(1)^2$.


For the present random variable $M_k(t) = \frac{(k-1)!\;\cdot\; t}{(k-t)\cdots(2-t)}$ is the factorial moment generating function for $k$ balls before the first step. From this value the already known $E_k=1+\sum_{i=1}^{k-1}\frac1i$ follows.

Proof: \begin{align*} M_k(t) &= \sum_{n=1}^\infty t^nP_k(n) \\ &= t\cdot\frac1k + \sum_{n=2}^\infty t^n\cdot\frac1k\sum_{i=2}^kP_i(n-1)\\ &= \frac tk\Bigl(1+\sum_{n=1}^\infty t^n\sum_{i=2}^k P_i(n) \Bigr)\\ &= \frac tk\Bigl(1+\sum_{i=2}^k M_i(t)\Bigr)\\ \Leftrightarrow\qquad M_k(t) &= \frac t{k-t}\Bigl(1+\sum_{i=2}^{k-1}M_i(t)\Bigr) \end{align*} Induction. Begin: $k=2$, $M_2(t)=\frac t{2-t}$. Step: \begin{align*} M_{k+1}(t) &= \frac t{k+1-t}\Bigl(1+\sum_{i=2}^kM_i(t)\Bigr) \\ &= \frac t{k+1-t}\Bigl(1+M_k(t) + \frac {k-t}t M_k(t) - 1\Bigr) \\ &= \frac t{k+1-t}\Bigl(\frac{(k-1)!\cdot t}{(k-t)\cdots(2-t)}\frac{t+k-t}t\Bigr)\\ \end{align*}


EDIT: We can go further.

The generating function of a finite sum of independent variables is just the product of the generating functions.

Therefore we can think of the random variable in question as of a sum $X=\sum_{i=1}^k X_i$ where $$ M_{X_i}(t)=\begin{cases} t & i=1\\ \frac{i-1}{i-t} & i\neq 1 \end{cases} $$ $X_1=1$ is constant and $X_{i\neq1}$ is distributed geometrically $P(n)=(1-p)p^n$ where $p=1/i$. joriki's answer suggests to interpret those $X_i$ as the number of times the number $i$ is drawn.