World Cup: what group stage result (vector) is most likely?

485 Views Asked by At

In the World Cup 2018 group stage, a group consists of 4 teams, and every team plays every other team once, for a total of 6 games. A team gets 3 points per win, 1 point per draw, 0 point per loss. After all 6 games, the teams are ranked by total number of points.

Let $v$ be the sorted 4-vector of total points. E.g. $v=(9,6,3,0)$ represents a group where there are no draws, the top team beats all others, the 2nd top team beats the other two, and the 3rd top team only beats the worst team. Similarly, $v=(3,3,3,3)$ represents a group where all 6 games are drawn.

(Note: only a small number of different vectors are possible. This was asked in World Cup Standings but has no posted answer.)

My question is: which $v$ is most likely? Obviously this depends on the probability model. For the purpose of this question, assume (unrealistically) that each game is i.i.d., has probability $p$ of being drawn, and each team wins with equal probability ${1-p \over 2}$.

What I seek: as a function of $p$, which $v$ is most likely?

Further comments:

  • Obviously, for any given value of $p$, the solution can be found numerically (exactly and/or Monte Carlo with high precision). However, I'm hoping for a more intuitive answer using e.g. symmetry arguments, graph theory, entropy(?!) etc.

  • I'm also interested in transitions as $p$ changes from $0$ to $1$. (E.g. as $p$ goes from $0$ to $1$, the $Prob(v=(3,3,3,3))$ also goes from $0$ to $1$, but at what point does $(3,3,3,3)$ become the most likely?)

  • If you can't solve the general problem, it might still be interesting to know the answer for "typical" values of $d$. E.g. as of this writing - the last games of Groups E & F just finished - there are 9 draws out of 44 games, so $p=9/44 \approx 0.2$.

  • Lastly, one possible approach which I thought about for a bit but didnt make much progress: the problem might be easier if, instead of varying $p$, we vary the number of draws $D \in [0,6]$. I.e. conditioned on $D$ draws out of 6 games, which $v$ is most likely? This may provide an intermediate step to answering my original question (because $D$ is distributed $Binomial(6,p)$).

[Off topic] Good luck to all remaining teams... and may all future VAR decisions be non-controversial! :)

2

There are 2 best solutions below

0
On BEST ANSWER

First let's list the possible vectors and their probablities in terms of $p$ and $q=1-p$. As the number of draws determines the sum of the points, we can only get the same vector if we have the same number of draws.

\begin{array}{c|c|c} \text{#draws}&\text{vector}&\text{probability}\\\hline 6&[3,3,3,3]&p^6\\\hline 5&[5,3,3,2]&6p^5q\\\hline 4&[7,3,2,2]&3p^4q^2\\ 4&[5,5,3,1]&3p^4q^2\\ 4&[5,5,2,2]&3p^4q^2\\ 4&[5,4,3,2]&6p^4q^2\\\hline 3&[9,2,2,2]&\frac12p^3q^3\\ 3&[6,5,2,2]&\frac32p^3q^3\\ 3&[5,5,3,2]&\frac32p^3q^3\\ 3&[5,5,5,0]&\frac12p^3q^3\\ 3&[7,5,2,1]&3p^3q^3\\ 3&[7,4,2,2]&3p^3q^3\\ 3&[5,5,4,1]&3p^3q^3\\ 3&[7,4,3,1]&3p^3q^3\\ 3&[5,4,4,2]&3p^3q^3\\ 3&[4,4,4,3]&p^3q^3\\ \hline 2&[9, 4, 2, 1] & \frac32p^2q^4\\ 2&[7, 5, 4, 0] & \frac32p^2q^4\\ 2&[7, 4, 4, 1] & \frac94p^2q^4\\ 2&[7, 6, 2, 1] & \frac32p^2q^4\\ 2&[4, 4, 4, 4] & \frac38p^2q^4\\ 2&[5, 4, 4, 3] & \frac32p^2q^4\\ 2&[7, 7, 1, 1] & \frac38p^2q^4\\ 2&[7, 4, 3, 2] & \frac32p^2q^4\\ 2&[7, 5, 3, 1] & \frac32p^2q^4\\ 2&[6, 5, 4, 1] & \frac32p^2q^4\\ 2&[6, 4, 4, 2] & \frac32p^2q^4\\ \hline 1&[9, 4, 4, 0] & \frac38pq^5\\ 1&[7, 6, 4, 0] & \frac34pq^5\\ 1&[9, 4, 3, 1] & \frac34pq^5\\ 1&[9, 6, 1, 1] & \frac38pq^5\\ 1&[6, 6, 4, 1] & \frac34pq^5\\ 1&[6, 4, 4, 3] & \frac98pq^5\\ 1&[7, 6, 3, 1] & \frac34pq^5\\ 1&[7, 4, 3, 3] & \frac34pq^5\\ 1&[7, 7, 3, 0] & \frac38pq^5\\ \hline 0&[9,6,3,0]&\frac38q^6\\ 0&[9,3,3,3]&\frac18q^6\\ 0&[6,6,6,0]&\frac18q^6\\ 0&[6,6,3,3]&\frac38q^6 \end{array}

(I worked these out by hand down to $3$ draws, then I was missing one with $3$ draws and couldn't be bothered to figure it out and coded it up after all. :-)

Only the vectors with the highest coefficients for each monomial can ever be the most likely, and it's straightforward to work out at which values of $p$ the crossovers among the monomials occur. The probability $3p^3q^3$ for the case of $3$ draws is the only one that never dominates:

\begin{array}{c|c|c} \text{#draws}&\text{vector}&\text{probability}&\text{domain}\\\hline 6&[3,3,3,3]&p^6&p\in[\frac67,1]\\\hline 5&[5,3,3,2]&6p^5q&p\in[\frac12,\frac67]\\\hline 4&[5,4,3,2]&6p^4q^2&p\in[\frac{2\sqrt6-3}5,\frac12]\approx[0.38,\frac12]\\\hline 3&[7,5,2,1]&3p^3q^3\\ 3&[7,4,2,2]&3p^3q^3\\ 3&[5,5,4,1]&3p^3q^3\\ 3&[7,4,3,1]&3p^3q^3\\ 3&[5,4,4,2]&3p^3q^3\\ \hline 2&[7, 4, 4, 1] & \frac94p^2q^4&p\in[\frac13,\frac{2\sqrt6-3}5]\approx[\frac13,0.38]\\ \hline 1&[6, 4, 4, 3] & \frac98pq^5&p\in[\frac14,\frac13]\\ \hline 0&[9,6,3,0]&\frac38q^6&p\in[0,\frac14]\\ 0&[6,6,3,3]&\frac38q^6&p\in[0,\frac14] \end{array}

I wouldn't know how to derive this result just based on symmetry arguments :-)

0
On

There are $3^6=729$ ways to put between $0$ and $6$ arrows on the edges of a labeled $K_4$. Let $q:={1-p\over2}$. An assignment containing $r\in [0..6]$ arrows then has probability $q^rp^{6-r}$. We now have to go through the $729$ cases and collect the probabilities of the various possible scoring vectors. I don't think it pays to set up a Polya counting scheme in order to exploit the occurring symmetries. The resulting probabilities all are polynomials of degree $\leq6$ in $p$, and some search will be necessary to identify the most probable scoring vector in terms of $p$.

I went through the cases, and it turned out that there are $40$ possible scoring vectors, numbered $1$ to $40$ in my setup. I let Mathematica compute the resulting polynomial $s_j(p)$ for each of them, and then determined for the values $p_k={k\over400}$ $(0\leq k\leq400)$ which scoring vector had the highest probability. The result is in the following figure:

enter image description here

Inspection of this figure allows to determine a posteriori the $p$-values where the jumps take place. They are $${1\over4},\quad{1\over3},\quad{2\sqrt{6}-3\over5}=0.3798,\quad{1\over2},\quad{6\over7}\ ,$$ as in Joriki's answer. Note that some of the probabilities $s_j(p)$ have respectable values, as can be seen from the following figure which shows a plot of all $40$ functions $p\mapsto s_j(p)$:

enter image description here