Team Ranking Model vs Graph Theory Model

157 Views Asked by At

Suppose we have $4$ sports teams, $\{T_1, T_2, T_3, T_4\}$, and we have to following records:

$T_1$ beat $T_2$ by a score of 4

$T_1$ beat $T_4$ by a score of 2

$T_2$ beat $T_3$ by a score of 1

$T_2$ loses to $T_4$ by a score of 7

$T_3$ and $T_4$ tied

We want to know what is the ranking of these teams (from best team to worst team).

Consider the Kenneth Massey's team ranking model, a very simply model that assigns $r_i$ to team $T_i$ to consider the difference between team scores. So we have the linear system:

$$r_1 - r_2 =4, r_1 -r_4 = 2, r_2 - r_3 = 1, r_2 - r_4 = -7, r_3 - r_4 = 0$$

that can be represented by the matrix:

$$\begin{bmatrix} 1 & -1 & 0 & 0 \\ 1 & 0 & 0 & -1 \\ 0 & 1 & -1 & 0 \\ 0 & 1 & 0 & -1 \\ 0 & 0 & 1 & -1 \end{bmatrix} \cdot \begin{bmatrix}r_1 \\ r_2 \\ r_3 \\ r_4 \end{bmatrix} = \begin{bmatrix}4 \\ 2 \\ 1 \\ -7 \\ 0 \end{bmatrix}$$

We solve for the least-square solution to $r_i$ and get:

$$\begin{bmatrix}r_1 \\ r_2 \\ r_3 \\ r_4 \end{bmatrix} = \begin{bmatrix} 1.125 \\ -3.75 \\ -2.375 \\ 0 \end{bmatrix} + r_4 \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \end{bmatrix}$$

So we have a team ranking of $T_1 > T_4 > T_3 > T_2$


I can't help but to think of the problem in terms of an adjacency matrix for a graph. Consider this approach:

Initialize a graph with $4$ vertices, each representing its respective team $T_i$, draw an directed edge with edge weight $w_{ji}$ from $T_i$ to $T_j$ if $T_i$ beat $T_j$ by $w_{ji}$. If a team lost, reverse the direction of the edge so that all edges have positive weight (ignore ties).

I notices if we consider the sum of each vertex's outgoing edges minus its incoming edges, and rank all team accordingly, we get the same ranking. That is $T_1$ have $2+4 = 6$, $T_2$ have $1 - 4 - 7 = -10$, and so on. Sorting these gives us $T_1 > T_4 > T_3 > T_2$, same solution as above.


So I'm wondering if this these two "models" will always produce the same result? My hypothesis is that it does, but I'm not certain.

1

There are 1 best solutions below

0
On BEST ANSWER

Not necessarily. Consider the (rather boring) set of outcomes:

  • $T_1$ and $T_2$ tied.
  • $T_1$ and $T_4$ tied.
  • $T_2$ and $T_3$ tied.
  • $T_2$ beat $T_4$ by a score of $2$.
  • $T_3$ loses to $T_4$ by a score of $3$.

In the "graph theory" model, $T_1$ has a score of $0$ and $T_4$ has a score of $1$, so $T_1$ loses to $T_4$.

However, in the team ranking model, the least-squares solution we get is $(\frac18, \frac14, -\frac{11}{8}, 0) + r_4 (1,1,1,1)$, giving $T_1$ a slight edge over $T_4$.


If we add a sixth game between $T_1$ and $T_3$, then this is not possible. In that case, we can compute the answer in terms of a vector of six scores $(s_1,s_2,s_3,s_4,s_5,s_6)$, for both algorithms. The team ranking model then gives us $$ (r_1,r_2,r_3,r_4) = \left(\frac{s_1+s_2+s_6}{4},\frac{-s_1+s_3+s_4}{4}, \frac{-s_3+s_5-s_6}{4}, \frac{-s_2-s_4-s_5}{4}\right) $$ up to adding a multiple of $(1,1,1,1)$, which is just what the graph theory model tells us.

When $T_1$ and $T_3$ don't play a game, the graph theory model acts as if they tie; the team ranking model "makes a guess" about the hypothetical outcome of that game which is a linear combination of the other scores. This gives us a potential discrepancy.

In the example above, it's reasonable to suppose that if $T_1$ and $T_3$ played, then $T_1$ would win. After all, the first two games suggest that $T_1$ is about as good as $T_2$ and $T_4$; the average of the scores of $T_2$ and $T_4$ versus $T_3$ is $\frac32$. This is in fact the "guess" that the team ranking model ends up making about what the score of $T_1$ versus $T_3$ would be if they played.


To elaborate on the "guessing" that I refer to: of course, the team ranking algorithm doesn't actually involve making such a guess. Rather, this is an emergent feature of the model. If we only have five scores $(s_1,s_2,s_3,s_4,s_5)$, then up to adding a multiple of $(1,1,1,1)$ the team ranking model assigns the scores $$ (r_1,r_2,r_3,r_4) = \left(\frac{3s_1 + 3s_2 + s_3 - s_5}{8}, \frac{-s_1+s_3+s_4}{4}, \frac{-s_1-s_2-3s_3+3s_5}{8}, \frac{-s_2-s_4-s_5}{4}\right) $$ which exactly corresponds to setting $s_6 = \frac{s_1+s_2+s_3-s_5}{2}$ in the previous solution. So in some sense, $\frac{s_1+s_2+s_3-s_5}{2}$ is the model's guess at the what would happen if $T_1$ and $T_3$ had played. (It is the average of $s_1+s_3$ and $s_2-s_5$: the guesses at the outcome based on how $T_1$ and $T_3$ both played against $T_2$, and based on how $T_1$ and $T_3$ both played against $T_4$.)