Order statistics of scaled beta distributions (Project Euler 573)

269 Views Asked by At

I am trying to solve Project Euler problem 573.

To summarize my understanding of the problem: in a race with $n$ contenders, runner $k$ runs at speed $v_{k} = k/n$ and has to cover the distance $D_k = X_{(k)}$ where $X_i \sim U(0,1)$ (i.e. $D_k$ is the $k$th order statistic of a uniform random variable). The problem asks the expected value for the winner of a race with $n$ runners : $E_n = \mathbf{E}[P_{n,k}]$, where $P_{n,k}$ is the probability that runner $k$ wins the race.

What I have so far:

We know that $D_k \sim B(k, n + 1 - k)$, therefore the running time of runner $k$ is $T_k = n/k \cdot D_k \sim B(k, n + 1 - k, 0, n/k)$ : a scaled beta distribution.

$P_{n,k}$ is the probability that runner $k$ has the fastest time: $P_{n,k} =\prod_{i \neq k}P\left(T_k < T_i \right)$

EDIT : $T_k$ are dependent so the correct formula is $P_{n, k} = P\left(\underset{i \neq k}\cap(T_k < T_i) \right)$

Where I'm stuck: I actually have two problems.

The first one is that I can't make anything of $P(T_k < T_i)$ except ugly integrals : is there a closed form for $P(T_k < T_i)$, or another way to find the minimum of $n$ scaled beta distributions with different parameters ?

The second problem is that I'm doubting that $T_k \sim B(k, n + 1 - k, 0, n/k)$. Why ? I wrote some code to simulate races, and the code using a beta variable doesn't output the same results as the (correct) code that sorts a uniform r.v. To illustrate :

def do_one_race(n):
    dist = []
    for k in xrange(n):
        dist.append(random.random())
    dist.sort()
    best_time = 100
    winner = 0
    for k in xrange(1, n+1):
        time = dist[k-1]*n/k
        if time < best_time:
            best_time = time
            winner = k
    return (winner, best_time)

When run 10,000 times with 3 runners, this simulation yields [0.447, 0.223, 0.331] for the probability that each runner wins the race, very close to the theoretical value $[4/9, 2/9, 1/3]$. The average winning time is 0.499 (theoretical value $1/2$).

def do_one_race_beta(n):
    best_time = 100
    winner = 0
    for k in xrange(1, n+1):
        dist = random.betavariate(k, n + 1 - k)
        time = dist*n/k
        if time < best_time:
            best_time = time
            winner = k
    return (winner, best_time)

On the other hand, using a beta distribution to generate the distances, I get different results: [0.465, 0.298, 0.235] (average winning time : 0.42) for 3 runners for instance. This is driving me nuts : I can't see the flaw in my math or the bug in my code !

1

There are 1 best solutions below

2
On BEST ANSWER

The problem in your calculations is that you assume that the startings positions $D_k$ are independent. However, since they all have support $[0, 1]$ and $D_k < D_{k +1}$ almost surely, they are not independent. Consequently, the running times $T_k$ are also dependent, which is why your second simulation yields incorrect values.