I'm building a scoring system with score from 0 to 5) and I would like to sort products according to the number of reviews and their scores. After some research on the Internet I have found two formulas to get a good ranking system.
Which one is the Bayesian formula ?
What is the other formula ?
Which one is valid ?
I have tried to explain the problem using a simple reviews table and SQL queries, here's the reviews table.
Reviews table
Product | Score
Product A | 3 \
Product A | 3 |
Product A | 3 |-> 5 reviews
Product A | 3 |
Product A | 3 /
Product B | 2 \
Product B | 2 |
Product B | 2 |
Product B | 2 |
Product B | 2 |-> 10 reviews
Product B | 2 |
Product B | 2 |
Product B | 2 |
Product B | 2 |
Product B | 2 /
Product C | 1 \
Product C | 1 |
Product C | 1 |
Product C | 1 |
Product C | 1 |
Product C | 1 |
Product C | 1 |
Product C | 1 |-> 15 reviews
Product C | 1 |
Product C | 1 |
Product C | 1 |
Product C | 1 |
Product C | 1 |
Product C | 1 |
Product C | 1 /
Method A, general formula:
(AVG_NUM_VOTE * AVG_RATING) + (PRODUCT_NUM_VOTES * PRODUCT_AVG_SCORE)
---------------------------------------------------------------------
(AVG_NUM_VOTE + PRODUCT_NUM_VOTES)
AVG_NUM_VOTE: SELECT COUNT(product)/COUNT(DISTINCT product) FROM reviews => 10
AVG_RATING: SELECT AVG(score) FROM reviews => 1.666666...
Now, to get the value for each product:
Product A:
PRODUCT_NUM_VOTE: SELECT COUNT(product) FROM reviews WHERE product = 'Product A' = 5
PRODUCT_AVG_SCORE: SELECT AVG(score) FROM reviews WHERE product = 'Product A' = 3
(10 * 1.66666) + (5 * 3)
------------------------ = 2.111111...
10 + 5
Product B:
PRODUCT_NUM_VOTE: SELECT COUNT(product) FROM reviews WHERE product = 'Product B' = 10
PRODUCT_AVG_SCORE: SELECT AVG(score) FROM reviews WHERE product = 'Product B' = 2
(10 * 1.66666) + (10 * 2)
------------------------- = 1.833333...
10 + 10
Product C:
PRODUCT_NUM_VOTE: SELECT COUNT(product) FROM reviews WHERE product = 'Product C' = 15
PRODUCT_AVG_SCORE: SELECT AVG(score) FROM reviews WHERE product = 'Product C' = 1
(10 * 1.66666) + (15 * 1)
------------------------- = 1.266666666
10 + 15
Method B, general formula:
TOTAL_SUM + (PRODUCT_NUM_VOTE * PRODUCT_AVG_SCORE)
--------------------------------------------------
(TOTAL_VOTE + PRODUCT_NUM_VOTE)
TOTAL_SUM: SELECT SUM(count) FROM ( SELECT COUNT(DISTINCT product) AS count FROM reviews ) AS SUM = 5*3 + 10*2 + 15*1 = 50
TOTAL_VOTE: SELECT COUNT(product) FROM reviews = 30
Product A:
PRODUCT_NUM_VOTE: SELECT COUNT(product) FROM reviews WHERE product = 'Product A' = 5
PRODUCT_AVG_SCORE: SELECT AVG(score) FROM reviews WHERE product = 'Product A' = 3
50 + (5 * 3)
------------ = 1.857142
30 + 5
Product B:
PRODUCT_NUM_VOTE: SELECT COUNT(product) FROM reviews WHERE product = 'Product B' = 10
PRODUCT_AVG_SCORE: SELECT AVG(score) FROM reviews WHERE product = 'Product B' = 2
50 + (10 * 2)
------------- = 1.75
30 + 10
Product C:
PRODUCT_NUM_VOTE: SELECT COUNT(product) FROM reviews WHERE product = 'Product C' = 15
PRODUCT_AVG_SCORE: SELECT AVG(score) FROM reviews WHERE product = 'Product C' = 1
50 + (15 * 1)
------------- = 1.444444
30 + 15
Now if I compare the two methods:
Product | Method A | Method B
Product A | 2.111111 | 1.857142
Product B | 1.833333 | 1.75
Product C | 1.266666 | 1.444444
Which method is the Bayesian average ? And what is the other method ? What is the best formula to use ?
Thanks for reading...
Edit:
During my search, I've noticed that in method B:
TOTAL_SUM = (AVG_RATING * AVG_NUM_VOTE * (#products) )
TOTAL_VOTE = (AVG_NUM_VOTE * (#products) )
#products is the number of different products taken in account.
My guess is that the first formula is incomplete and we need to add something in it, like this:
(AVG_NUM_VOTE * AVG_RATING * #products) + (PRODUCT_NUM_VOTES * PRODUCT_AVG_SCORE)
---------------------------------------------------------------------------------
( (AVG_NUM_VOTE * #products) + PRODUCT_NUM_VOTES)
What do you think ?