I want to have a value for users between 0-1 , that shows how much they like movies of specific type depending on how many movies they have watched and movie type ( comedy , etc ). I have a data of users watching movies during years.
data format
- user - movie - year - type
Was thinking of something like
user1_comdey = number of comedy movies / number of watched movies
How can I use the years in the equation to make recent years more important ?
What we call this in Math ? I didn't know what best tags to use.
Thanks
Long comment:
Let's focus on one particular user, Alice, and on comedy movies, during a period of years $\{y_1, y_2, \ldots, y_n\},$ where $y_n$ is the most recent year. Assume Alice watched total $t_{i}$ movies during each year $i.$ Further more, she watched $c_{i}$ comedy movies during that year $i.$
To model Alice's preference of comedy movies, we can use the fraction $$ \frac{\sum_{i = 1}^{n} c_i}{\sum_{i = 1}^{n} t_i} = \frac{\text{total number of comedy movies watched}}{\text{total number of movies watched}} $$
To add more weight to recent years, you can use a weighted sum, and assign higher weights to recent years. In other words, Alice's comedy score would be: $$ \frac{\sum_{i = 1}^{n} w_i c_i}{\sum_{i = 1}^{n} t_i} $$ where, for example, $$w_n = n, w_{n-1} = n-1, \ldots, w_1 = 1.$$ This weight assignment intuitively says that if Alice watched a comedy movie recently then we are going to count it more than once.
You can use different weight schemes. For example, you can read about exponential weights here in this Wikipedia article on Moving Averages.