Alternative strategy to choose the largest random number

169 Views Asked by At

I want to discuss a strategy to determine what is the largest number of a random sample.

Suppose that we generate a random sample of $n$ distinct numbers from $0$ to $1$ according to a uniform distribution. The sample generated will be labeled as $I = \{d_1,...,d_n\}$ where $0\leq d_1 < d_2 < ... < d_n\leq 1$ (I have sorted the sample for simplicity).

Let us randomly shuffle the sample. This would give us a particular permutation of $I$ that I will label as

$$ S = \{z_1,z_2,...,z_n\} $$

Each of the elements $z_i$ is shown one by one to a person who has to decide whether $z_i$ is the maximum of the whole sample. If this person chooses $z_i$ as the maximum and they are right, they win, otherwise, they lose. If this person doesn't choose $z_i$, then they need to make a decision about $z_{i+1}$. They can accept $z_{i+1}$ as the maximum, and check if they win, or they refuse $z_{i+1}$ and check $z_{i+2}$ and so on.

I want to test the following strategy to make a decision. Notice that I am not sure if the following strategy is optimal in any sense (and I don't care at this point).

  1. Assume that we refuse the first $k-1$ elements. If the $k$-th element $z_k$ is greater than all the others drawn so far, we consider $z_k$ as a candidate for the maximum.
  2. If $z_k$ is a candidate, the probability that the next $n-k$ elements are smaller than $z_k$ is $z_k^{n-k}$.
  3. I decide the element $z_k$ is the maximum if $z_k^{n-k}\geq r$ for some cut-off $r$ that maximizes the probability of winning the game under such strategy.

Ultimately, I want then to find the value of such cut-off $r$. Let me define the following events

$$ A_k : z_k = d_n \\ B_k: z_k^{n-k}\geq r\, \cap\,z_k > z_i \forall i = 1,..,k-1 $$ I think that the probability of winning under these conditions is $$ P(r) = \sum_{k=1}^{n}p(B_k)p(A_k|B_k)\,. $$ Question: Is the expression for $P(r)$ right?

By means of the Bayes theorem, $P(r)$ takes the form $$ P(r) =\sum_{k=1}^{n}p(A_k)p(B_k|A_k)\,. $$

Now, $$p(A_k)= 1/n$$ and $P(B_k|A_k)$ is simply the probability that $d_n^{n-k}\geq r$, since $d_n > z_i \forall i = 1,..,k-1 $ is always true, therefore $$ P(B_k|A_k) = (1-r^\frac{1}{n-k}) $$ and therefore I get

$$ P(r) = \sum_{k=1}^{n}\frac{1}{n}(1-r^\frac{1}{n-k}) $$

However, this cannot be correct. For $r=0$, this last equation says that $P(0)=1$, which is really wrong. I have the feeling I have messed around with the formulae and fundamental concepts of probability. Can anyone help me?

EDIT I have implemented this strategy numerically on Mathematica for $n=100$. Look at the following code

checkStrategy[r_, n_] := Module[{win = 0, loss = 0, list, max, k, i},
  For[i = 1, i <= 1000, i++,
   list = RandomSample[Range[10 n], n]/(10 n) // N;
   max = Max[list];
   For[k = 1, k <= n, k++,
    If[list[[k]]^(n - k) >= r && list[[k]] > Max[list[[1 ;; k - 1]]], 
     If[list[[k]] == max, win = win + 1; k = n + 1, k = n + 1];]
    ];
   ];
  loss = 1000 - win;
  Return[{r, win/(win + loss), loss/(win + loss)} // N]]

SeedRandom[42];
points = Table[checkStrategy[s, 100], {s, 0, 0.99, 0.01}];
points[[All, 1 ;; 2]] // ListPlot

which returns the following plot enter image description here

As you can see, the probability has a maximum for values of $r$ between 0.4 and 0.6.

Notice that this is Problem 48 of the book "Fifty Challenging Problems in Probability with Solutions". In this book, the authors give a strategy whose probability of success is 0.58 at large $n$ (not so different from what I have found numerically).

2

There are 2 best solutions below

4
On

We have rejected $k-1$ numbers. Now the scenario looks like this:

The $k^{th}$ position can be occupied by one of $d_k, d_{k+1}, ... d_n$

Let $d_k$ be in position $k$. We can arrange this $\binom {k-1}{k-1} (k-1)! (n-k)!$ ways

if $d_{k+1}$ in position $k$, we get $\binom {k+1}{k-1} (k-1)! (n-k)!$ ways

... and if $d_{n}$ in position $k$, we get $\binom {n-1}{k-1} (k-1)! (n-k)!$ ways

For a total of $(k-1)! (n-k)! \left(\binom {k-1}{k-1} + \binom {k+1}{k-1} + ... + \binom {n-1}{k-1} \right)$

= $(k-1)!(n-k)! \binom{n}{n-k} = (k-1)!(n-k)! \binom{n}{k} = n!/k$

The number of ways we win is $\binom {n-1}{k-1} (k-1)! (n-k)! = (n-1)!$

So probability of winning if we accept $k^{th}$ number is $k/n$.

So if you have a cutoff of $r$, then $k \ge nr$

Let us check:

If $r = 1/n$, then $k \ge 1$ which makes sense.

if $r = 1$, then $k \ge n$ which is not possible since we only have $n$ numbers.

if $r = 0$, then $k \ge 0$.

We need $k-1 \le n-k \implies k \le \frac{n+1}{2}$

So $nr \le k \le \frac{n+1}{2}$

$r$ is max when $k = \lfloor{\frac{n+1}{2}}\rfloor$

2
On

I think I see two errors. There may be others I have not detected.

First, in the formula for $P(r)$, you choose the $k$th permuted element only if $B_k$ is true and this is the first element $i$ for which $B_i$ has been true. You could write it this way:

$$ P(r) =\sum_{k=1}^n p(A_k) p(B_k \cap \bar B_1 \cap \bar B_1 \cap \cdots \cap \bar B_{k-1} \mid A_k). $$

This makes the calculation more laborious. Maybe there's another way.

Where your formula goes wrong when $r = 0$ is that when $r=0$ you will always simply choose the first element in the permutation. In that case there should not be any contribution to the sum from any other terms after the first one.

Second, assuming that your original sample had $n$ independent uniformly distributed values which you then sorted, that is, $d_n$ is the $n$th order statistic of the set, $d_n$ is not uniformly distributed. If we were to take one random value $X$, uniformly distributed on $[0,1]$, the probability that $X < t$ is $t$ if $0 \leq t \leq 1.$ But here we have $n$ such random values, and $d_n$ is the greatest of them. In order for it to happen that $d_n < t,$ every one of the $n$ independent uniform random numbers must be less than $t.$ The chance of this happening is $t^n.$

We have $d_n^{n-k}\geq r$ if and only if $d_n\geq r^{1/(n-k)}.$ Let $t = r^{1/(n-k)}$; then $$P(d_n < r^{1/(n-k)}) = P(d_n < t) = t^n = r^{n/(n-k)},$$ and therefore

$$ P(d_n^{n-k}\geq r) = 1 - r^{n/(n-k)}. $$

When $0 < r < 1$ this will be greater than $1 - r^{1/(n-k)}.$