For example, I have the win probability vector p = [0.2, 0.5, 0.8] which means the first player wins with a probability of 0.2 against a random player, the player 2 wins with a probability of 0.5 against a random player and so on.
I want to generate a pairwise matrix having each 1vs1 probabilities. I wrote this formula:
$$P(\textrm{A wins against B}) = \frac{P(\textrm{A wins}) \cdot (1 - P(\textrm{B wins}))}{P(\textrm{A wins}) \cdot (1 - P(\textrm{B wins})) + (1 - P(\textrm{A wins})) \cdot P(\textrm{B wins})}$$
So, to generate the matrix, we can use this formula:
$$M_{ij} = \frac{p_i \cdot (1 - p_j)}{p_i \cdot (1 - p_j) + (1 - p_i) \cdot p_j}$$
with p the win prob vector and M the matrix I want to generate.
My question is: what is the right formula?
Because when I empirically try to "prove" the formula, I get results near the expected result (but not exact).
Here the python code of the proof:
import numpy as np
from scipy import stats
import random
def getRandomFloat(min=0.0, max=1.0, decimalMax=2):
return round(random.uniform(min, max), decimalMax)
def truncateFloat(f, n=2):
'''Truncates/pads a float f to n decimal places without rounding'''
s = '{}'.format(f)
if 'e' in s or 'E' in s:
return float('{0:.{1}f}'.format(f, n))
i, p, d = s.partition('.')
return float('.'.join([i, (d+'0'*n)[:n]]))
def generate_pairwise_win_prob(win_prob, float_precision=None):
# We create the pairwise win probability (`p_win_prob`):
p_win_prob = np.zeros((len(win_prob), len(win_prob)))
w = win_prob
for i in range(len(win_prob)):
for j in range(i, len(win_prob)):
# p_win_prob[i, j] = 1 / (1 + np.exp(w[j] - w[i])) # The Bradley-Terry-Luce model doesn't work
p_win_prob[i, j] = (w[i] * (1 - w[j])) / (w[i] * (1 - w[j]) + (1 - w[i]) * w[j])
if float_precision is not None:
p_win_prob[i, j] = truncateFloat(p_win_prob[i, j], float_precision)
p_win_prob[j, i] = 1 - p_win_prob[i,j]
return p_win_prob
def pwp_empirical_proof(win_prob, draw_prob_interval=None):
p_win_prob = generate_pairwise_win_prob(win_prob)
victories = [0] * len(win_prob)
defeats = [0] * len(win_prob)
for i in range(100000):
a, b = random.sample(range(len(win_prob)), 2)
result = match(a, b, p_win_prob, draw_prob_interval=draw_prob_interval)
if result != 0:
if result == 1:
victories[a] += 1
defeats[b] += 1
else:
victories[b] += 1
defeats[a] += 1
predicted_win_prob = []
for i in range(len(win_prob)):
current = victories[i] / (victories[i] + defeats[i])
current = truncateFloat(current, 2)
predicted_win_prob.append(current)
print("win_prob: " + str(win_prob))
print("predicted_win_prob: " + str(predicted_win_prob))
print()
# We define the function that will give the result of a match:
def match(i, j, p_win_prob, draw_prob_interval=None): # draw a comparision from the model
assert i != j
rdf = getRandomFloat()
if draw_prob_interval is not None and abs(p_win_prob[i, j] - rdf) <= draw_prob_interval:
return 0 # draw
elif rdf < p_win_prob[i,j]:
return 1 # i beats j
else:
return -1 # j beats i
pwp_empirical_proof([0.2, 0.5, 0.8])
And I get:
win_prob: [0.2, 0.5, 0.8]
predicted_win_prob: [0.12, 0.49, 0.87]
This is more an extended comment than an answer, party because, as I indicated in my comments, I don't think it is possible to answer the question as given.
As I said, we need more information; the player's winning percentage against competitors at large doesn't determine his probability of beating a particular opponent, even if we know the opponent's winning percentage.
You can make other assumptions to try to compute the probability of A beating B, but you must make sure that it is consistent. The formula you suggest $$P_{AB}=\frac{P_A(1-P_B)}{P_A(1-P_B)+P_B(1-P_A)}$$ (where $P_{AB}$ is the probability that A beats B,) has the property that $P_{AB}+P_{BA}=1$ if $P_{BA}$ is computed according to the analogous formula, which is mandatory if there are no ties.
However, if the players are $A_1,A_2,\dots,A_n$, we also need, for example,
$$P_{A_1}=P_{A_1A_2}+P_{A_1A_3}+\cdots+P_{A_1A_n}\tag1$$ if $A_1$ is equally likely to play any of the other players. Does your formula guarantee this? I doubt it.
We might try to rectify this with a formula like $$P_{A_kA_j}=P_k\frac{P_{A_j}}{\sum_{m\neq k}P_{A_m}}$$
With this definition, equation $(1)$ would be satisfied, but we would no longer have $P_{AB}+P_{BA}=1$.
In short, I can't think of a formula, at least not off the top of my head, that would guarantee consistency of the results. Also, it seems to me that finding such a formula, if one exists, would require making entirely unrealistic assumptions about the problem.