Running average by storing only a single number

32 Views Asked by At

Say I have ratings (1-100) for a list of movies:

  • Guardians of the Galaxy: 92
  • Interstellar: 83
  • Wonder Woman: 95

Every time I watch one of the movies, I may provide a new rating. E.g. after watching each movie four times:

  • Guardians of the Galaxy: 92, 60, 85, 83
  • Interstellar: 83, 80, 82, 80
  • Wonder Woman: 95, 98, 78, 88

I would then want to see the average:

  • Guardians of the Galaxy: 80
  • Interstellar: 81.25
  • Wonder Woman: 89.75

If I store n numbers (all the ratings), I can easily get the average by adding them all and dividing by n.

If I store 2 numbers (currentAverage, numberTimesSeen), I can also easily get the average when I add a new rating. E.g. after seeing each movie 3 times, I would get:

  • Guardians of the Galaxy: (79, 3)
  • Interstellar: (81.66, 3)
  • Wonder Woman: (90.33, 3)
v1 + v2 + v3       n - 1        v4        v1 + v2 + v3 + v4
------------   *   -----   +   ----   =   -----------------
   n - 1             n           n                n

When I add the new ratings (83, 80, 88, respectively), I would get:

  • Guardians of the Galaxy: 79*3/4 + 83/4 = 80
  • Interstellar: 81.66*3/4 + 80/4 = 81.25
  • Wonder Woman: 90.33*3/4 + 88/4 = 89.75

Our new values would then be:

  • Guardians of the Galaxy: (80, 4)
  • Interstellar: (81.25, 4)
  • Wonder Woman: (89.75, 4)

However, I'm still forced to store two values.

Can I get a reasonably accurate value that takes into the history by storing only one value? What's the best formula to use?

1

There are 1 best solutions below

0
On BEST ANSWER

One solution is to use a low pass filter as a strategy:

rating = alpha * oldRating + (1 - alpha) * newRating

Where alpha is a value between 0 and 1:

  • If alpha is 0, it will completely ignore the oldRating
  • If alpha is 0.25, newRating is going to have a heavier weight
  • If alpha is 0.5, oldRating has just as much influence as newRating (i.e. it's taking the average of the two)
  • If alpha is 0.75, oldRating is going to have a heavier weight
  • If alpha is 1, it will completely ignore the newRating

For brief intro to low pass filters, Apple talks about them in Building Advanced Gesture Recognizers (around the 32:45 minute mark).

Here you can visually see how different alpha numbers affect the ratings. The blue line will represent the real ratings at any given time. Each colored line represents a different alpha value chosen. E.g. the yellow line is at 0.9 alpha, thus heavily weighing the oldRating, so it takes a while for it to catch up to the real rating, but produces a smoother curve.

enter image description here


Assuming we choose alpha = 0.5, we would end up with the following ratings at each iteration:

Actual ratings:

  • Guardians of the Galaxy: 92, 60, 85, 83
  • Interstellar: 83, 80, 82, 80
  • Wonder Woman: 95, 98, 78, 88

1st iteration:

  • Guardians of the Galaxy: 92
  • Interstellar: 83
  • Wonder Woman: 95

2nd iteration:

  • Guardians of the Galaxy: (92+60)/2 = 76
  • Interstellar: (83+80)/2 = 81.5
  • Wonder Woman: (95+98)/2 = 96.5

3rd iteration:

  • Guardians of the Galaxy: (76+85)/2 = 80.5
  • Interstellar: (81.5+82)/2 = 81.75
  • Wonder Woman: (96.5+78)/2 = 87.25

4th iteration:

  • Guardians of the Galaxy: (80.5+83)/2 = 81.75
  • Interstellar: (81.75+80)/2 = 80.87
  • Wonder Woman: (87.25+88)/2 = 87.6

These are not too far from the real averages:

  • Guardians of the Galaxy: 80
  • Interstellar: 81.25
  • Wonder Woman: 89.75

And only required storing a single value. A nice thing about the formula is you can tweak it to your liking. So if you want historic data to contribute more to the rating, you could use an alpha of 0.6 instead of 0.5, for example.