I'm a complete novice in this area - so "explaining to me like I'm 5" would be most appreciated.
Essentially, I've been tasked with changing the a relevancy algorithm for products displayed on a website - which is fine...
The 'quality' metric is how users rate the product - there are four states:
Suitable, Maybe, Unsuitable, Pending
Each day - There are approximately 100 ratings/day - and as the products age, their rating-counts increase - but broadly, the ratios stay consistent as time goes by - but there is a slight variance day-by-day.
My question: Upon making a change to the algorithm, what is the minimum sample size (length of time I have to run the test) before I can be sure I have a statistically significant result? How can I be confidant that any change I make is positive - and not just within some margin of error?
Thanks for any help!
Here's a suggestion, why not pick a few products and establish your own significance point as follows;
What you'll typically find is that the line you plot will start off busy, and converge onto a steady rating.
Typically, the smallest number of observations required to create a normal distribution will be 20. Hence, you will likely find that after 20 observations, your average rating will not be significantly different from 50 or 100 ratings.
I hope this helps.
regards Marcus