Bayes Estimator Wikipedia Example

51 Views Asked by At

This Bayes estimator Wikipedia example feels disconnected from the rest of the article.

I get the idea that $W$ is a weighted average of the movie's average rating and the global average rating, but not how the formula relates to the article's broader concepts. I assume that the Bayesian updating process consists of the reevaluation of $W$ (via the given formula) each time $v$ increments, but the example does not explicitly tie in the loss function or the Bayes estimator. It is unclear what expressions such as $a|\theta - \hat \theta|$ have to do with the rating system.

The entire article is a bit confusing to me, so I'm trying to use the movie example to ground the concepts. Which piece in this example is the loss function and which piece is the Bayes estimator? Why are they useful when the formula for $W$ is already explicitly given?

2

There are 2 best solutions below

2
On

In the motivation for the formula, we imagine that every movie has an unknown true rating; this is the unknown parameter $\theta$. The true rating is a parameter, because the actual ratings that users give to the movie are i.i.d. samples from some distribution with mean $\theta$.

The value $W$ is the estimator $\hat \theta$, based on $R$ (the average of $v$ samples from the distribution with mean $\theta$) and $C$ (the expected value of $\theta$ based on the prior).

Earlier in the Wikipedia article, we see similar formulas for the situation where "the prior estimate and a measurement are normally distributed". So, for example, let's suppose:

  • The prior distribution of $\theta$ is normal with mean $C$ and variance $\Sigma^2$;
  • Each rating given to a movie with true rating $\theta$ is normal with mean $\theta$ and variance $\sigma^2$.

Then $R$, the average of $v$ movie ratings, is also normal with mean $\theta$, but it has variance $\frac{\sigma^2}{v}$. The formula on Wikipedia tells us that the posterior distribution is a normal distribution with mean $$\frac{\sigma^2/v}{\sigma^2/v + \Sigma^2}C + \frac{\Sigma^2}{\sigma^2/v + \Sigma^2} R = \frac{(\frac{\sigma}{\Sigma})^2 C + Rv}{(\frac{\sigma}{\Sigma})^2 + v}$$ and variance $\Sigma^2 + \frac{\sigma^2}{v}$. This is exactly IMDB's formula $W = \frac{Rv + mC}{v + m}$ if we assume that $m$ is the ratio between $\sigma^2$ (the variance in users' ratings of a specific movie) and $\Sigma^2$ (the variance in true ratings of movies).

0
On

Try this. I suspect the loss function used is proportional to $(\theta-\hat \theta)^2$.

  • Suppose IMDb has a Dirichlet prior distribution for the proportion of votes of each number for a film and selected parameters $w_1,w_2,\ldots, w_{10}$, where $w=\sum w_i$ and $C=\frac{\sum_1^{10} i \,w_i}w$
  • The prior distribution will have a joint density proportional to $\prod_1^{10} x_i^{w_i-1}$ where $x_i$ is the probability that a particular vote is $i$ rather than any of the other possibilities
  • If the sample votes for a particular film are $v_1,v_2,\ldots, v_{10}$ with total $v=\sum v_i$ and mean $R=\frac{\sum_1^{10} i\, v_i}v$ then this gives a likelihood proportional to $\prod_1^{10} x_i^{v_i}$
  • So the posterior joint density is proportional to to the product of the prior density and the likelihood, i.e. $\prod_1^{10} x_i^{v_i+w_i-1}$ which is another Dirichlet distribution
  • This gives a posterior marginal density for each $x_i$ proportional to $x_i^{v_i+w_i-1}(1-x_i)^{v-v_i+w-w_i-1}$, a Beta distribution with mean $\mathbb E[x_i]=\frac{v_i+w_i}{v+w}$
  • If the loss function is proportional to $(x_i - \hat x_i)^2$ then to minimise the expected loss we want to chose $\hat x_i$ to minimise $\mathbb E[(x_i - \hat x_i)^2]= \mathbb E[x_i^2] -2 \hat x_i \mathbb E[x_i] + \hat x_i^2$ which is minimised when its derivative with respect to $\hat x_i$ of $ -2 \mathbb E[x_i] + 2\hat x_i$ is zero, i.e when $\hat x_i=\mathbb E[x_i]=\frac{v_i+w_i}{v+w}$
  • That gives a loss-minimising estimator $\hat x_i$ for each $x_i$, and combining them gives a estimated loss-minimising expected vote for a particular film based on the prior and the likelihood of the real votes of $$\sum_1^{10} i \hat x_i = \frac{\sum_1^{10} i(v_i+w_i)}{v+w} = \frac{Rv+Cw}{v+w}.$$