I have a daily quiz where a user has 3 attempts to answer correctly. So they will either answer on attempt1, attempt2, attempt3 or be wrong. I store the daily answer rate for each individual date and can calculate how well a user performed on an individual day.
What I’m struggling to work out is how to calculate the users ranking percentage across all daily questions.
I know how many games each user has played and answered correctly or wrongly and at what attempt. I also store the overall correct rate across all users as well as how many for each individual attempt (as well as the total wrong answers).
So the cumulative stats across say 20 users for 100 questions (bearing in mind each person could have answered a different number in that time) might look like
gamesWon: 50;
gamesLost: 50;
attempt1: 12;
attempt2: 20;
attempt3: 18;
and for a particular user who might have played 10 games total I might have:
gamesPlayed: 10;
gamesWon: 8;
gamesLost:2;
attempt1: 4;
attempt2: 2;
attempt3: 2;
What would be a good calculation to represent
"You currently are in the top X% of users overall." for a particular user.
Am I even collecting the stats in a meaningful way to do this?
Thanks in advance
Let's model this probabilistically. Let $X_{ij}$ be the result for the $j$th game for the $i$th individual. We can model each attempt as $$X_{ij} \sim \text{Geometric}(p_i)$$
Here, $p_i$ represents the probability of getting a question right by the $i$th individual on any attempt, and a higher $p_i$ represents a more competent individual.
Of course, we don't observe attempts past attempt 3, so we have to work with a truncated version of the distribution. For individual $i$, their likelihood is \begin{align*} \mathcal{L} = p_i^{N_{i1}}[(1-p_i)p_i]^{N_{i2}}[(1-p_i)^2p_i]^{N_{i3}}[(1-p_i)^3]^{N_{i4}} \end{align*} where $N_{i1}, N_{i2}, N_{i3}, N_{i4}$ are the number of Attempt 1, Attempt 2, Attemp 3, and Games Lost respectively by individual $i$. The maximum likelihood estimator (MLE) for individual $i$ is \begin{align*} \widehat{p}_i = \frac{N_{i1} + N_{i2} + N_{i3}}{N_{i1} + 2N_{i2} + 3N_{i3} + 3N_{i4}} \end{align*} which can then be used to rank individuals (the higher the $\widehat{p}_i$, the better)
Addendum: accounting for # of games played Consider scores by these two individuals:
Based on system, we have $\widehat{p}_1 = 0.971$ and $\widehat{p}_2 = 1$, and therefore Individual 2 is ranked higher. This seems widely unfair to base a ranking for an individual with so few games; perhaps a fairer approach is to rank based off \begin{align*} \widetilde{p}_i: \widetilde{p}_i \text{ is the largested value } \le \widehat{p}_i \text{ such that } (\widehat{p}_i - \widetilde{p}_i)^2 = z^* \frac{1 - \widetilde{p}_i}{N_i \widetilde{p}_i} \end{align*} where $N_i = \sum_{k=1}^{4}N_{ik}$ is the number of games played by individual $i$ and $z^*$ is a tuning parameter (for simplicity, you can set $z^*=1$). This is similar to Wilson's score lower bound with an approximation for the standard error for $\widehat{p}_i$ assuming a full (non-truncated) geometric distribution. From this procedure, we find that $\widetilde{p}_1 = 0.946$ and $\widetilde{p}_2 = -0.755$, and therefore Individual 1 outranks Individual 2.