About Bayesian formula and rating system

999 Views Asked by At

I'm building a scoring system with score from 0 to 5) and I would like to sort products according to the number of reviews and their scores. After some research on the Internet I have found two formulas to get a good ranking system.

Which one is the Bayesian formula ?

What is the other formula ?

Which one is valid ?

I have tried to explain the problem using a simple reviews table and SQL queries, here's the reviews table.

Reviews table

Product   | Score
Product A | 3      \
Product A | 3      |
Product A | 3      |-> 5 reviews
Product A | 3      |
Product A | 3      /
Product B | 2      \
Product B | 2      |
Product B | 2      |
Product B | 2      |
Product B | 2      |-> 10 reviews
Product B | 2      |
Product B | 2      |
Product B | 2      |
Product B | 2      |
Product B | 2      /
Product C | 1      \
Product C | 1      |
Product C | 1      |
Product C | 1      |
Product C | 1      |
Product C | 1      |
Product C | 1      |
Product C | 1      |-> 15 reviews
Product C | 1      |
Product C | 1      |
Product C | 1      |
Product C | 1      |
Product C | 1      |
Product C | 1      |
Product C | 1      /

Method A, general formula:

(AVG_NUM_VOTE * AVG_RATING) + (PRODUCT_NUM_VOTES * PRODUCT_AVG_SCORE)
---------------------------------------------------------------------
                 (AVG_NUM_VOTE + PRODUCT_NUM_VOTES)

AVG_NUM_VOTE: SELECT COUNT(product)/COUNT(DISTINCT product) FROM reviews => 10

AVG_RATING: SELECT AVG(score) FROM reviews => 1.666666...

Now, to get the value for each product:

Product A:

PRODUCT_NUM_VOTE: SELECT COUNT(product) FROM reviews WHERE product = 'Product A' = 5

PRODUCT_AVG_SCORE: SELECT AVG(score) FROM reviews WHERE product = 'Product A' = 3

(10 * 1.66666) + (5 * 3)
------------------------ = 2.111111...
          10 + 5

Product B:

PRODUCT_NUM_VOTE: SELECT COUNT(product) FROM reviews WHERE product = 'Product B' = 10

PRODUCT_AVG_SCORE: SELECT AVG(score) FROM reviews WHERE product = 'Product B' = 2

(10 * 1.66666) + (10 * 2)
------------------------- = 1.833333...
         10 + 10

Product C:

PRODUCT_NUM_VOTE: SELECT COUNT(product) FROM reviews WHERE product = 'Product C' = 15

PRODUCT_AVG_SCORE: SELECT AVG(score) FROM reviews WHERE product = 'Product C' = 1

(10 * 1.66666) + (15 * 1)
------------------------- = 1.266666666
          10 + 15

Method B, general formula:

TOTAL_SUM + (PRODUCT_NUM_VOTE * PRODUCT_AVG_SCORE)
--------------------------------------------------
        (TOTAL_VOTE + PRODUCT_NUM_VOTE)

TOTAL_SUM: SELECT SUM(count) FROM ( SELECT COUNT(DISTINCT product) AS count FROM reviews ) AS SUM = 5*3 + 10*2 + 15*1 = 50

TOTAL_VOTE: SELECT COUNT(product) FROM reviews = 30

Product A:

PRODUCT_NUM_VOTE: SELECT COUNT(product) FROM reviews WHERE product = 'Product A' = 5

PRODUCT_AVG_SCORE: SELECT AVG(score) FROM reviews WHERE product = 'Product A' = 3

50 + (5 * 3)
------------ = 1.857142
   30 + 5

Product B:

PRODUCT_NUM_VOTE: SELECT COUNT(product) FROM reviews WHERE product = 'Product B' = 10

PRODUCT_AVG_SCORE: SELECT AVG(score) FROM reviews WHERE product = 'Product B' = 2

50 + (10 * 2)
------------- = 1.75
   30 + 10

Product C:

PRODUCT_NUM_VOTE: SELECT COUNT(product) FROM reviews WHERE product = 'Product C' = 15

PRODUCT_AVG_SCORE: SELECT AVG(score) FROM reviews WHERE product = 'Product C' = 1

50 + (15 * 1)
------------- = 1.444444
   30 + 15

Now if I compare the two methods:

Product   | Method A | Method B
Product A | 2.111111 | 1.857142
Product B | 1.833333 | 1.75
Product C | 1.266666 | 1.444444

Which method is the Bayesian average ? And what is the other method ? What is the best formula to use ?

Thanks for reading...

Edit:

During my search, I've noticed that in method B:

  • TOTAL_SUM = (AVG_RATING * AVG_NUM_VOTE * (#products) )

  • TOTAL_VOTE = (AVG_NUM_VOTE * (#products) )

#products is the number of different products taken in account.

My guess is that the first formula is incomplete and we need to add something in it, like this:

(AVG_NUM_VOTE * AVG_RATING * #products) + (PRODUCT_NUM_VOTES * PRODUCT_AVG_SCORE)
---------------------------------------------------------------------------------
               ( (AVG_NUM_VOTE * #products) + PRODUCT_NUM_VOTES)

What do you think ?