Say I have ratings (1-100) for a list of movies:
- Guardians of the Galaxy: 92
- Interstellar: 83
- Wonder Woman: 95
Every time I watch one of the movies, I may provide a new rating. E.g. after watching each movie four times:
- Guardians of the Galaxy: 92, 60, 85, 83
- Interstellar: 83, 80, 82, 80
- Wonder Woman: 95, 98, 78, 88
I would then want to see the average:
- Guardians of the Galaxy: 80
- Interstellar: 81.25
- Wonder Woman: 89.75
If I store n numbers (all the ratings), I can easily get the average by adding them all and dividing by n.
If I store 2 numbers (currentAverage, numberTimesSeen), I can also easily get the average when I add a new rating. E.g. after seeing each movie 3 times, I would get:
- Guardians of the Galaxy: (79, 3)
- Interstellar: (81.66, 3)
- Wonder Woman: (90.33, 3)
v1 + v2 + v3 n - 1 v4 v1 + v2 + v3 + v4
------------ * ----- + ---- = -----------------
n - 1 n n n
When I add the new ratings (83, 80, 88, respectively), I would get:
- Guardians of the Galaxy: 79*3/4 + 83/4 = 80
- Interstellar: 81.66*3/4 + 80/4 = 81.25
- Wonder Woman: 90.33*3/4 + 88/4 = 89.75
Our new values would then be:
- Guardians of the Galaxy: (80, 4)
- Interstellar: (81.25, 4)
- Wonder Woman: (89.75, 4)
However, I'm still forced to store two values.
Can I get a reasonably accurate value that takes into the history by storing only one value? What's the best formula to use?
One solution is to use a low pass filter as a strategy:
Where alpha is a value between 0 and 1:
alphais 0, it will completely ignore the oldRatingalphais 0.25, newRating is going to have a heavier weightalphais 0.5, oldRating has just as much influence as newRating (i.e. it's taking the average of the two)alphais 0.75, oldRating is going to have a heavier weightalphais 1, it will completely ignore the newRatingFor brief intro to low pass filters, Apple talks about them in Building Advanced Gesture Recognizers (around the 32:45 minute mark).
Here you can visually see how different alpha numbers affect the ratings. The blue line will represent the real ratings at any given time. Each colored line represents a different alpha value chosen. E.g. the yellow line is at 0.9 alpha, thus heavily weighing the oldRating, so it takes a while for it to catch up to the real rating, but produces a smoother curve.
Assuming we choose alpha = 0.5, we would end up with the following ratings at each iteration:
Actual ratings:
1st iteration:
2nd iteration:
3rd iteration:
4th iteration:
These are not too far from the real averages:
And only required storing a single value. A nice thing about the formula is you can tweak it to your liking. So if you want historic data to contribute more to the rating, you could use an alpha of 0.6 instead of 0.5, for example.