Given an analysis of every pair of competitors in a race, how may I determine the probability of any given competitor winning the race?
For example, what is the probability of competitor 2 winning the following race? P(Cx Win) means the probability of Competitor x winning.
Cx Cy P(Cx Win) P(Cy Win) ---------------------------------- 1 2 0.3 0.7 1 3 0.4 0.6 1 4 0.9 0.1 2 3 0.8 0.2 2 4 0.7 0.3 3 4 0.9 0.1
I have tried to calculate a 'rating' for each competitor by adding their individual win probabilities. For example the rating for C1 would be 1.6. The rating for C2 would be 2.2 etc. I've tried different ways to use this rating to find the probability of the competitor winning, however my gut feel tells me something is wrong.
Is there a mathematical solution to this problem?
Presumably your probabilities are giving the win probabilities in two-person races between each pair of competitors. This does not necessarily determine what happens in a four-person race. What you need is a model for what happens in such a race.
One possible type of model is that each person's time for any given race is a random variable with a distribution from some family of distributions, these random variables all being independent. Now if you knew the parameters, you could (in principle) determine the win probabilities in both two-person and four-person races. For example, suppose competitor $i$'s time $T_i$ is normal with mean $\mu_i$ and standard deviation $\sigma_i$. Then $T_i - T_j$ is normal with mean $\mu_{ij} = \mu_i - \mu_j$ and standard deviation $\sigma_{ij} = \sqrt{\sigma_i^2 + \sigma_j^2}$. The probability that player $i$ wins against $j$, i.e. that $T_i - T_j < 0$, is $\Phi(-\mu_{ij}/\sigma_{ij})$ where $\Phi$ is the standard normal CDF. This leads to six equations involving the eight parameters $\mu_i$, $\sigma_i$. However, there are two symmetry operations that preserve all probabilities: adding a constant to all $\mu_i$, and scaling all $\mu_i$ and $\sigma_i$ by a positive constant factor. So we can assume, say, that $\mu_1 = 0$ and $\sigma_1 = 1$, and hope that the six equations determine the other six parameters.
Unfortunately, with your data it appears that there is no real solution to those six equations. So my model doesn't seem to fit your data.