Consider two restaurants $A$ and $B$ that both have a set of online reviews $S_A$ and $S_B$, and each review can have an integer score from 1 to 5. What is the probability that a new review for $B$, $r_B$, will have a higher score than a new review for $A$, $r_A$?
I tried to solve this problem by considering the distribution of reviews for each restaurant, that is $$P(r_B > r_A) = \sum_{i=1}^{5}P(r_A = i)P(r_B > i)$$ Where $P(r_A = i)$ would be the fraction of $S_A$ that is equal to $i$ and $P(r_B > i)$ the fraction of $S_B$ that is bigger than $i$.
However, I've noticed that this is only useful if the number of reviews for both $A$ and $B$ is sufficiently big. That's because if $A$ had 1000 reviews and an average score of 4.8 and $B$ had 1 review with a score of 5, then $B$, by the above equation, would be considered the best option, even though it has only 1 review. That means there is some uncertainty associated to the distribution of reviews. In this sense, how does one take this uncertainty into account?
Let $P_A(s)$ and $P_B(s)$ be categorical distributions over $\{1,...,5\}$.
Then we can model this using a Bayesian approach, which will automatically take into account sample size.
The conjugate prior of a categorical distribution is the Dirichlet distribution over the $4-$simplex. You can choose the uniform Dirichlet, where the concentration parameter $\alpha$ is $(1,1,1,1,1)$ to avoid any biasing (unless you have a reason to think we should favor a particular score for that restaurant a priori).
With this in place, we can calculate the poster predictive distribution of a new survey from each restaurant. Since we used a conjugate prior, the posterior predictive distribution has a nice form:
$$P(r_X = i|\mathbf{c},\alpha) = \frac{c_i+\alpha_i}{N + \sum_{i=1}^5 \alpha_i}$$
Where $c_i$ is the number of reviews for restaurant $X$ with score of $i$. N is just the total number of reviews.
Assuming we use the uniform prior $\alpha = (1,1,1,1,1)$ then this simplifies to:
$$P(r_X = i|\mathbf{c},\alpha) = \frac{c_i+1}{N + 5}$$
This means that if we have 1000 observations of $A$ and only $5$ of $B$ then the predictive distribution of $A$ will be driven almost entirely by the data whereas for $B$ it will still be influenced by the prior "flat" distribution, which will avoid the extreme noise of small sample sizes.
You can then use the posterior predictive distributions for $A$ and $B$ to do your analysis as you laid out in your post.