I am helping design a league system for my tennis club and the way that players have been rated for promotion or demotion from a level in the league seems flawed to me.
My argument is that at the end of a session, you total the number of points won by each player and base promotion/demotion on the absolute number for each player. But my colleague is arguing that to be helpful to players that have not played all their matches they take a score average and base promotion/demotion on that.
Consider a league where there are $10$ players at a level and player $A$ plays just one match but player $B$ plays all $9$ other players.
Player $A$ wins $1$ match, player $B$ wins $7$ but loses $2$ matches.
A match win means $3$ points, a loss means $1.$
Player $A$ average score is $3/1 = 3$
Player $B$ average score is $((3*7) + (2*1))/9 = 2.4$
On a total only system player $A$ scores better than player $B.$ But on the average system player $B$ scores better.
My view is that it is pretty easy to win one match. But to win ALL of $9$ matches is way harder. Also player $A$ can select someone in level that they know they can beat. So I think the average idea is very BAD!
Am I right?
How can I argue my case?
I'd argue that the sum is a not a good measure of player strength in situations where players can play different numbers of games. Note that players' total scores increase monotonically with the number of games played - a player who plays many games and loses them all will get a higher total score than a player who plays only a small number of games and wins them all. It doesn't make sense for a player to climb to the top of the leaderboard simply by playing poorly in a lot of games. In the extreme, you can have a top-ranked player who's never won a single game!
What you really want is something that respects both the proportion of wins, as well as the absolute number of wins. To do this, you can look at a confidence interval on players' win rate.:
Use the total number of games and number of wins to find a player's win rate. You can then compute the confidence interval around this proportion as illustrated here. The lower bound of this confidence interval gives you the win rate that you're confident the player is at least as good as. This approach has the benefit of respecting both the actual proportion of wins, as well as the number of games played. A player who wins 5 of 10 games, for example, is 95% likely to have a true win rate of at least 24%. A player who wins 10 of 20 games, however, is 95% likely to have a true win rate of at least 30%, despite having the exact same win rate as the player who player fewer games. A player who wins 9 of 20 games also has a lower confidence bound of a 24% win rate, so even though their observed win rate is slightly lower than the 5/10 player, their higher absolute number of wins makes you more sure that their win rate is not just a fluke.
If you rank players by the lower confidence bound of their win rate, you achieve the dual goal of encouraging both a high rate and a high absolute number of wins. It has the nice feature that between two players with the same win rate, the one who has played more games will be ranked higher. Note this does not respect difficulty of matches, so a player could pad their win rate/count by consistently playing the worst players in the league.