How to build a function that gives recent years higher weight?

253 Views Asked by At

I want to have a value for users between 0-1 , that shows how much they like movies of specific type depending on how many movies they have watched and movie type ( comedy , etc ). I have a data of users watching movies during years.

data format

  • user - movie - year - type

Was thinking of something like

user1_comdey = number of comedy movies / number of watched movies

How can I use the years in the equation to make recent years more important ?

What we call this in Math ? I didn't know what best tags to use.

Thanks

2

There are 2 best solutions below

12
On BEST ANSWER

Long comment:

Let's focus on one particular user, Alice, and on comedy movies, during a period of years $\{y_1, y_2, \ldots, y_n\},$ where $y_n$ is the most recent year. Assume Alice watched total $t_{i}$ movies during each year $i.$ Further more, she watched $c_{i}$ comedy movies during that year $i.$

To model Alice's preference of comedy movies, we can use the fraction $$ \frac{\sum_{i = 1}^{n} c_i}{\sum_{i = 1}^{n} t_i} = \frac{\text{total number of comedy movies watched}}{\text{total number of movies watched}} $$

To add more weight to recent years, you can use a weighted sum, and assign higher weights to recent years. In other words, Alice's comedy score would be: $$ \frac{\sum_{i = 1}^{n} w_i c_i}{\sum_{i = 1}^{n} t_i} $$ where, for example, $$w_n = n, w_{n-1} = n-1, \ldots, w_1 = 1.$$ This weight assignment intuitively says that if Alice watched a comedy movie recently then we are going to count it more than once.

You can use different weight schemes. For example, you can read about exponential weights here in this Wikipedia article on Moving Averages.

5
On

A simple weighting function is $w(movie)=\exp(t_{movie}-t_0)$, where $t_0$ is the present year and $t_{movie}$ is the year of the movie. For user $i$:

$u_i (comedy) \equiv \frac{1}{T_i} \sum_{k \in comedies} w_{movie_k}$

where $T_i=\sum_{n \in movies} w_{movie_n}$

This will give you a number between 0 and 1.