Erasing numbers from the front of the row

265 Views Asked by At

Numbers $1,2,\ldots,k$ are written in this order in a row. For $i=1,\ldots,k$, in the $i$th step, a random variable $V_i$ is drawn uniformly from the interval $[0,2i]$. If $V_i$ is greater than the first remaining number, that number is erased. What is the expected number of numbers that will be erased?

For example, if $k=1$, then we have the number $1$ and $V_1$ drawn from $[0,2]$, so $1/2$ numbers will be erased in expectation.

If $k=2$, then with probability $1/2$ we have $V_1>1$ and it is 50-50 whether the second number is erased. Otherwise $V_1<1$ and with probability $3/4$ the first number is erased. So the answer is $(1/2)(3/2)+(1/2)(3/4)=9/8$.

For $k=3$, a similar case analysis shows that the answer is $85/48$ (if I calculated correctly.) It could be that no closed form can be found in general. If so, upper/lower bounds would still be interesting.

3

There are 3 best solutions below

0
On BEST ANSWER

Let $E_{b,k}(a)$ be the expected number of erased numbers when you have already erased $a$ numbers and you are at the $b$th shot, out of $k$ shots.

Those can be defined for $1 \le a,b \le k+1$ by induction :
When you don't have any shot left, $E_{k+1,k}(a) = a$.
When you do you have $E_{b,k}(a) = \frac 1 {2b}((a+1) E_{b+1,k}(a) + (2b-a-1) E_{b+1,k}(a+1))$.

It turns out that every $E_{b,k}$ is an affine function :
Suppose $E_{b+1,k}(a) = pa + q$ for some $p,q \in \Bbb R$.
Then $E_{b,k}(a) = \frac 1{2b} ((a+1)(pa + q)+(2b-a-1)(p(a+1)+q)) = (1-\frac 1 {2b})pa + (q+(1-\frac 1 {2b})p)$.

So, focusing on those coefficients, we have $p_{b,k} = (1-\frac 1 {2b}) p_{b+1,k}$ and $q_{b,k} = q_{b+1,k} + p_{b,k}$

Since $p_{k,k}=1$ and $q_{k,k}=0$ we have :
$p_{b,k} = \prod_{j=b}^k (1-\frac 1 {2j})$ and
$q_{b,k} = \sum_{j=b}^k p_{j,k}$.

From there it's easy to find that when we fix $b$, the sequences $(q_{b,l})_{k \ge b-1}$ satisfy the recurrence relation $q_{b,k+1} = (1-\frac 1 {2k+2})(1 + q_{b,k})$.

Since we are interested in $E_{1,k}(0)=q_{1,k}$, we get the sequence (I'll rename it $(a_k)$) that starts with
$a_0 = 0$
$a_1 = \frac 12$
$a_2 = (\frac 12 + 1)\frac 34 = \frac 98$
$a_3 = (\frac 98 + 1)\frac 56 = \frac {85}{48}$ and so on.


Let $f_k(x) = (x+1)(1-\frac 1{2k+2})$.

We can check that $f_k(\frac {2k-1}3) = \frac {2(k+1)-1}3$ and $f_k(\frac 23 k) < \frac 23 (k+1)$.

Since the $f_k$ are increasing, this implies by induction that forall $k$, $\frac {2k-1}3 < a_k \le \frac 23k$.

Letting $b_k$ be the error term $a_k - \frac {2k-1}3$, we have $b_{k+1} = (1-\frac 1 {2k+2})b_k = \frac {2k+1}{2k+2}b_k$. Then for $k \ge 1$, $b_k = \frac {(2k-1)!!}{(2k)!!}b_0 = \frac{(2k-1)!}{2^{2k-1}(k-1)!k!}b_0 = 2^{1-2k}\binom{k-1}{2k-1}b_0$

Since $b_0 = \frac 13$, we get the closed form $a_k = \frac13 ((2k-1)+2^{1-2k}\binom{k}{2k-1})$


As for the asymptotics of $(a_k)$, one can use Stirling's formula to see that it has a development in $1/\sqrt k$ at any order.

$a_k = \frac 13 (2k -1 + \frac 1 {\sqrt{k \pi}} - \frac 1 {8\sqrt{k\pi}k} + \frac 1 {128\sqrt{k\pi}k^2} + \ldots + o(k^{n- \frac 12}))$.

0
On

Brute-force calculation for $k=1$ to $25$ shows that the curve very quickly becomes near-linear. $$\begin{array}{|c|c|c|} \hline k & EV(k) & EV(k)-EV(k-1) \\ \hline 1 & 0.500 & 0.500000 \\ 2 & 1.125 & 0.625000 \\ 3 & 1.771 & 0.645833 \\ 4 & 2.424 & 0.653646 \\ 5 & 3.082 & 0.657552 \\ 6 & 3.742 & 0.659831 \\ 7 & 4.403 & 0.661296 \\ 8 & 5.065 & 0.662303 \\ 9 & 5.728 & 0.663030 \\ 10 & 6.392 & 0.663575 \\ 11 & 7.056 & 0.663997 \\ 12 & 7.720 & 0.664331 \\ 13 & 8.385 & 0.664600 \\ 14 & 9.050 & 0.664822 \\ 15 & 9.715 & 0.665006 \\ 16 & 10.380 & 0.665162 \\ 17 & 11.045 & 0.665295 \\ 18 & 11.711 & 0.665409 \\ 19 & 12.376 & 0.665508 \\ 20 & 13.042 & 0.665595 \\ 21 & 13.707 & 0.665672 \\ 22 & 14.373 & 0.665740 \\ 23 & 15.039 & 0.665800 \\ 24 & 15.705 & 0.665854 \\ 25 & 16.371 & 0.665903 \\ \hline\end{array}$$

0
On

Note: Here are some other aspects towards a closed expression of the expectation values $E_k(X)$ of erased numbers. We focus at the probability tree for $k\geq 1$. This tree is some kind of weighted Leibniz harmonic triangle.

We derive a recurrence formula, detect some nice symmetric relations and give a representation of specific coefficients in terms of harmonic numbers. To find some relations, we start with a closer look of the cases $k=3,4$.

Example $k=3$

\begin{array}{ccccccccc} \{1,2,3\}&\frac{1}{2}&\{2,3\}&\frac{2}{4}&\{3\}&\color{blue}{\frac{3}{6}}&\emptyset\\ \\ \frac{1}{2}&&\frac{2}{4}&&\color{blue}{\frac{3}{6}}\\ \\ \{1,2,3\}&\frac{3}{4}&\{2,3\}&\color{blue}{\frac{4}{6}}&\{3\}\\ \\ \frac{1}{4}&&\color{blue}{\frac{2}{6}}\\ \\ \{1,2,3\}&\color{blue}{\frac{5}{6}}&\{2,3\}\\ \\ \color{blue}{\frac{1}{6}}\\ \\ \{1,2,3\}\\ \end{array}

The probability tree for $k=3$ represents the remaining sets of $\{1,2,3\}$ as nodes and the transition probabilities after each step $l=0,1,2,3$. We could consider this tree as a triangle rotated counter-clockwise by ${90}^\circ$. The sets which are possible in step $l$ are written down diagonally.

If we look at the top left entry $\{1,2,3\}$ we see we obtain with probability $\frac{1}{2}$ the successor states $\{2,3\}$ horizontally and $\{1,2,3\}$ vertically.

The rightmost diagonal of transition probabilities shows the (blue) values
\begin{align*} \color{blue}{\frac{1}{6}}\quad\color{blue}{\frac{5}{6}}\qquad\color{blue}{\frac{2}{6}}\quad\color{blue}{\frac{4}{6}}\qquad\color{blue}{\frac{3}{6}}\quad\color{blue}{\frac{3}{6}}\tag{2} \end{align*} from bottom left to top right. We see the resulting sets in the rightmost diagonal. E.g. the set $\{2,3\}$ in the rightmost diagonal is obtained from $\{1,2,3\}$ with probability $\frac{5}{6}$ and from $\{2,3\}$ with probability $\frac{2}{6}$.

We calculate the probabilities of the finally obtained sets: \begin{array}{rl} \emptyset&xxx\\ &\frac{1}{2}\frac{2}{4}\frac{3}{6}=\frac{6}{48}=\frac{1}{8}\\ \\ \{3\}&yxx+xyx+xxy\\ &\frac{1}{2}\frac{3}{4}\frac{4}{6} +\frac{1}{2}\frac{2}{4}\frac{4}{6} +\frac{1}{2}\frac{2}{4}\frac{3}{6} =\frac{26}{48}=\frac{13}{24}\\ \\ \{2,3\}&yyx+yxy+xyy\\ &\frac{1}{2}\frac{1}{4}\frac{5}{6} +\frac{1}{2}\frac{3}{4}\frac{2}{6} +\frac{1}{2}\frac{2}{4}\frac{2}{6} =\frac{15}{48}=\frac{5}{16}\tag{1}\\ \\ \{1,2,3\}&yyy\\ &\frac{1}{2}\frac{1}{4}\frac{1}{6}=\frac{1}{48}\\ \end{array}

We obtain for $k=3$ the expected value $E_3(X)$ of erased numbers \begin{align*} E_3(x)&=3-\frac{13}{24}-2\frac{5}{16}-3\frac{1}{48}=\frac{85}{48}=1.7708\overline{3} \end{align*}

We observe nice regularities in (2)

  • The odd entries from left to right are $\frac{j}{2k}$ with $1\leq j \leq k$

  • The even entries from right to left are $\frac{k+j}{2k}$ with $0\leq j < k$

  • Two successive entries $\frac{2j+1}{2k},\frac{2j+2}{2k}$ sum up to $1$

We also see, that the resulting probabilities starting from the beginning set $\{1,2,3\}$ are derived similarly as we do when adding up values in a pascal triangle or leibniz harmonic triangle but weighted with the corresponding transition probabilities. If we look at the resulting set $\{2,3\}$ after three steps, we see in (1) it can be reached with probability $\frac{5}{16}$ and the paths are \begin{align*} yyx+yxy+xyy \end{align*} meaning that we use all paths containing one $x$-step horizontally and two $y$-steps vertically when starting from $\{1,2,3\}$ in the probability tree when we want to reach $\{2,3\}$. The number of $y$ in the path description is bijectively related to the number of remaining elements in the set $\{2,3\}$.

These symmetries and also the same transition probabilities are valid for $k$ in general as we can see, when we look at the example for $k=4$.

Example $k=4$

\begin{array}{ccccccccc} \{1,2,3,4\}&\frac{1}{2}&\{2,3,4\}&\frac{2}{4}&\{3,4\}&\frac{3}{6}&\{4\}&\color{blue}{\frac{4}{6}}&\emptyset\\ \\ \frac{1}{2}&&\frac{2}{4}&&\frac{3}{6}&&\color{blue}{\frac{5}{8}}\\ \\ \{1,2,3,4\}&\frac{3}{4}&\{2,3,4\}&\frac{4}{6}&\{3,4\}&\color{blue}{\frac{5}{8}}&\{4\}\\ \\ \frac{1}{4}&&\frac{2}{6}&&\color{blue}{\frac{3}{8}}\\ \\ \{1,2,3,4\}&\frac{5}{6}&\{2,3,4\}&\color{blue}{\frac{6}{8}}&\{3,4\}\\ \\ \frac{1}{6}&&\color{blue}{\frac{2}{8}}\\ \\ \{1,2,3,4\}&\color{blue}{\frac{7}{8}}&\{2,3,4\}\\ \\ \color{blue}{\frac{1}{8}}\\ \\ \{1,2,3,4\} \end{array}

Obseve that we have in the leftmost three diagonals the same probability transitions as in the example for $k=3$.

\begin{array}{rl} \emptyset&xxxx\\ &\frac{1}{2}\frac{2}{4}\frac{3}{6}\frac{4}{8}=\frac{24}{384}=\frac{1}{16}\\ \\ \{4\}&yxxx+xyxx+xxyx+xxxy\\ &\frac{1}{2}\frac{3}{4}\frac{4}{6}\frac{5}{8} +\frac{1}{2}\frac{2}{4}\frac{4}{6}\frac{5}{8} +\frac{1}{2}\frac{2}{4}\frac{3}{6}\frac{5}{8} +\frac{1}{2}\frac{2}{4}\frac{3}{6}\frac{4}{8} =\frac{154}{384}=\frac{77}{192}\\ \\ \{3,4\}&yyxx+yxyx+yxxy+xyyx+xyxy+xxyy\\ &\frac{1}{2}\frac{1}{4}\frac{5}{6}\frac{6}{8} +\frac{1}{2}\frac{3}{4}\frac{2}{6}\frac{6}{8} +\frac{1}{2}\frac{3}{4}\frac{4}{6}\frac{3}{8} +\frac{1}{2}\frac{2}{4}\frac{2}{6}\frac{6}{8} +\frac{1}{2}\frac{2}{4}\frac{4}{6}\frac{3}{8} +\frac{1}{2}\frac{2}{4}\frac{3}{6}\frac{3}{8} =\frac{168}{384}=\frac{7}{16}\\ \\ \{2,3,4\}&yyyx+yyxy+yxyy+xyyy\\ &\frac{1}{2}\frac{1}{4}\frac{1}{6}\frac{7}{8} +\frac{1}{2}\frac{1}{4}\frac{5}{6}\frac{2}{8} +\frac{1}{2}\frac{3}{4}\frac{2}{6}\frac{2}{8} +\frac{1}{2}\frac{2}{4}\frac{2}{6}\frac{2}{8} =\frac{37}{384}\\ \\ \{1,2,3,4\}&yyyy\\ &\frac{1}{2}\frac{1}{4}\frac{1}{6}\frac{1}{8}=\frac{1}{384}\\ \end{array}

We obtain for $k=4$ the expected value $E_4(X)$ of erased numbers \begin{align*} E_4(x)&=4-\frac{77}{192}-2\frac{7}{16}-3\frac{37}{384}-4\frac{1}{384}=\frac{931}{384}=2.4244791\overline{6} \end{align*}

From these examples it is easy to derive a recurrence relation for the transition probabilities.

$$ $$

Recurrence relation:

The example trees above together with the observations at the end of example $k=3$ indicate the following recurrence relation of the transition probabilities.

Let $p_{k,l}$ denote the probability to obtain the set $\{l+1,l+2,\ldots,k\}$ when starting with the set $\{1,2,\ldots,k\}$ with $0\leq l < k$. The following holds valid

\begin{align*} p_{k,l}&=\frac{2k-l}{2k}p_{k-1,l-1}+\frac{l+1}{2k}p_{k-1,l}&\qquad 1\leq l<k\tag{3}\\ p_{k,0}&=\frac{1}{2^kk!}&\qquad k\geq 1\\ p_{k,k}&=\frac{1}{2^k}&\qquad k\geq 1 \end{align*}

The expectation value $E_k(X)$ describing the number of expected erased numbers is given by \begin{align*} E_k(x)=k-\sum_{l=1}^kkp_{k,l}\qquad k\geq 0 \end{align*}

$$ $$

Generating functions and harmonic numbers:

It would be nice to derive a generating function \begin{align*} A(x,y)=\sum_{k,l\geq 0}p_{k,l}x^ky^l \end{align*} for the probabilities $p_{k,l}$ and to obtain a closed expression with the help of $A(x,y)$. This approach leads to a first order linear partial differential equation in $x$ and $y$ which I wasn't able solve. Somewhat more modest we focus on generating functions corresponding to a row or a column of the matrix representation in the examples above.

We observe that according to (3) the generating functions for $p_{k,0}$ and $p_{k,k}$ are \begin{align*} \sum_{k\geq 0}p_{k,0}x^k&=\sum_{k\geq 0}\frac{1}{2^kk!}x^k=\exp\left(\frac{x}{2}\right)\\ \sum_{k\geq 0}p_{k,k}x^k&=\sum_{k\geq 0}\frac{1}{2^k}=\frac{1}{1-\frac{x}{2}} \end{align*} So, we derive a geometric series for the entries of the first row in the matrix representation of the probability tree and an exponential series for the entries of the first column.

We consider now the generating functions for column entries. We define for $m\geq 0$ the generating function $T_m(x)$ as \begin{align*} T_m(x)=\sum_{k\geq m}p_{k,k-m}x^k \end{align*}

This corresponds to the row entries of matrix representation of the probability tree. We obtain from (3) the following recurrence relation

\begin{align*} p_{k,k-m}&=\frac{k+m}{2k}p_{k-1,k-m-1}+\frac{k-m+1}{2k}p_{k-1,k-m}&\qquad 1\leq m<k\tag{4}\\ p_{k,0}&=\frac{1}{2^kk!}&\qquad k\geq 1\\ p_{k,k}&=\frac{1}{2^k}&\qquad k\geq 1 \end{align*}

We consider the special case $m=1$ and show the following is valid for the probabilities $p_{k,k-1}$ with $k\geq 1$

  • The probabilities $p_{k,k-1}$ follow the recurrence relation

\begin{align*} p_{k,k-1}&=\frac{k+1}{2k}p_{k-1,k-2}+\frac{k}{2k}p_{k-1,k-1}&\qquad 1\leq m<k\tag{5}\\ p_{k,0}&=\frac{1}{2^kk!}&\qquad k\geq 1\\ p_{k,k}&=\frac{1}{2^k}&\qquad k\geq 1 \end{align*}

  • A generating function $T_1(x)$ for the probabilities $p_{k,k-1}$ is \begin{align*} T_1(x)=\sum_{k\geq 1}p_{k,k-1}x^k=-\frac{\log\left(1-\frac{x}{2}\right)}{\left(1-\frac{x}{2}\right)^2}\tag{6} \end{align*}

  • The generating function $T_1(x)$ fulfills the following linear differential equation \begin{align*} \left(1-\frac{x}{2}\right)T_1^{\prime}(x)=T_1(x)+\frac{1}{2\left(1-\frac{x}{2}\right)^2}\tag{7} \end{align*}

  • The probabilities $p_{k,k-1}$ are the generalised harmonic numbers $H_k^{(2)}$

\begin{align*} p_{k,k-1}=H_k^{(2)}=\sum_{l=1}^{k}H_l\qquad\qquad k\geq 1\tag{8} \end{align*}

The recurrence relation (5) follows immediately from (4) by setting $m=1$. In order to obtain the generating function $T_1(x)$ we use the recurrence relation (5) and obtain \begin{align*} \sum_{k\geq 2}2kp_{k,k-1}x^k&=\sum_{k\geq 2}(k+1)p_{k-1,k-2}x^k+\sum_{k\geq 2}2kp_{k-1,k-1}x^k\\ 2xD_x\sum_{k\geq 2}p_{k,k-1}x^k&=\left(xD_x+I\right)\sum_{k\geq 2}p_{k-1,k-2}x^k+xD_x\sum_{k\geq 2}\frac{1}{2^{k-1}}x^k\tag{9}\\ 2xD_x\left(T_1(x)-\frac{x}{2}\right)&=\left(xD_x+I\right)\sum_{k\geq 1}p_{k,k-1}x^{k+1} +2xD_x\left(\frac{1}{1-\frac{x}{2}}-1-\frac{x}{2}\right)\\ 2xD_xT_1(x)-x&=\left(xD_x+I\right)(xT_1(x))+\frac{4x}{(x-2)^2}-x\tag{10}\\ 2xD_xT_1(x)&=\left(x^2D_x+2xI\right)T_1(x)+\frac{4x}{(x-2)^2}\tag{11}\\ (2x-x^2)D_xT_1(x)&=2xT_1(x)+\frac{4x}{(x-2)^2}\\ \left(1-\frac{x}{2}\right)D_xT_1(x)&=T_1(x)+\frac{1}{2\left(1-\frac{x}{2}\right)^2} \end{align*}

and (7) follows.

Comment:

  • In (9) we use the operator notation $D_x=\frac{d}{dx}$ for the derivation and the identity operator $I$.

  • In (10) we do some index shifting und use the formula for the geometric series

  • In (11) we use the Leibniz product rule $D_xx=xD_x+I$

Verifying that $T_1(x)=-\frac{\log\left(1-\frac{x}{2}\right)}{\left(1-\frac{x}{2}\right)^2}$ fulfills the differential equation and noting that $T_1(0)=0$ the claim (6) follows.

Since a generating function for the harmonic numbers $H_k=\sum_{l=1}^k\frac{1}{l}$ is \begin{align*} \sum_{k=1}^{\infty}H_kx^k=-\frac{\log(1-x)}{1-x} \end{align*} and multiplication of a generating function with $\frac{1}{1-x}$ means summing up the coefficients, the claim (8) follows.

$$ $$

Possible ways to proceed:

It would be nice to find an explicit expression for the $p_{k,l}$ with $0\leq l < k$, which seem to be strongly connected with harmonic numbers. Some ideas to proceed are

  • Derive a generating functions $T_m(x)$ for $m\geq 2$ which could help to indicate the structure of the generating function $A(x,y)$

  • Find a generating function $A(x,y)$ by directly using the recurrence relation (3)

  • The kind of obtaining the probabilities $p_{k,l}$ as weighted paths via $x$-steps and $y$-steps as mentioned above indicates also a relationship to $q$-binomial coefficients.