My wife and I play a lot of rummy against each other, and we keep a record of the score for each game, as well as the accumulated score. Since I have a substantial lead at his point, I claim that I'm a better rummy player than she is, while she argues that it's pure luck.
Most people agree that rummy is a game that combines luck and skill, but for many other games it can be hard to decide.
Assume we have the following list of scores from a game, but we know nothing about the game.
Round : 1 2 3 4 5 6 7 8 9 10 | Sum Avg Wins
Player A: - - 5 2 - - 17 - - - | 24 2.4 3
Player B: 12 3 - - 8 4 - 5 17 4 | 53 5.3 7
Is it possible to examine the scores and decide how much luck/skill is involved? It seems reasonable to me that both the total points and number of wins are important, but are there also other factors?
The simplest analysis is for the number of wins. Let $n$ be the number of independent games, and $X$ be the number of wins for Player B. Then under the null hypothesis that the players are equally likely to win any one game, $X \sim \mathsf{Binom}(n, 1/2)$.
If B is in the lead with $x > n/2$ observed wins, we would reject the null hypothesis (and conclude that B is more skillful), if $P(X \ge x)$ is remarkably small. Smaller the 0.025 might be a reasonable criterion.
In your case, $n = 10$ and $P(X \ge 7) = 1 - P(X \le 6) = 0.1719,$ which is not remarkably small. (The computation in R statistical software is shown below.)
If we had $x = 9$ then $P(X \ge x) = 0.0107,$ which might be persuasive to you or a neutral party, but possibly not to your wife who starts with a deeply held belief that rummy is a game of chance.
It is not the proportion of wins that governs the outcome. If Player B had $x = 35$ observed wins in $n = 50$ games, then the computation would be $P(X \ge 35) = 1 - P(X \le 34) = 0.0033,$ a persuasive result. So if you maintain your proportionate lead for a larger number of games, then you have a good case that you are more skillful.
If you want to look in a statistics textbook under 'one-sample binomial test', you can find a more-technical explanation. Also, perhaps something about a normal approximation, which might make sense for $n = 50,$ but not for as few as $n = 10$ games.
Using scores would require a knowledge of rummy that I do not have. If the winner has score $12,$ does it make sense to say that the loser has score $-12?$ If so, here is a formal statistical test.
I find it hard to believe that these scores are normal, so I'm using a one-sample Wilcoxon ('signed-rank') test of the hypothesis that the population median score is $0$ against the two sided alternative that it is not. This test does not require normal data.
The results from R statistical software are shown below. To be persuasive, one would need the
P-valueto be less than about 0.05. (The warning message has to do with the two 4's in the data, and the fact that the sample median is 4; I investigated this and found that the exact P-value must be above 0.20.)Again here, it is possible that data from more games with a continuing difference between players would yield significant results.