I am reading the paper From Amateurs to Connoisseurs: Modeling the Evolution of User Expertise through Online Reviews and I don't understand a graph. I don't understand what the y-axis is trying to show.
Here is the relevant text (emphasis mine):
Figure 1 examines the relationship between product ratings, user experience level, and hoppiness, on RateBeer data. Beers of three types are considered (in increasing order of hoppiness):
lagers, mild ales, and strong ales. The x-axis shows the average rating of products on the site (out of 5 stars), while the y-axis shows the difference between expert and novice ratings. ‘Experts’ are those users to whom our model assigns the highest values of our latent experience score, while ‘novices’ are assigned the lowest value; Figure 1 then shows the difference in product bias terms between the two groups.This simple plot highlights some of our main findings with regard to user evolution: firstly, there is significant variation between the ratings of beginner and expert users, with the two groups differing by up to half a star in either direction. Secondly, there exist entire genres of products that are preferred almost entirely by experts or by beginners: beginners give higher ratings to almost all lagers, while experts give higher ratings to almost all strong ales; thus we might conclude that strong ales are an ‘acquired taste’.
Questions:
- If the y-axis show's "difference" how can it be negative?
- How does the graph show a "significant variation between the ratings of beginner and expert users"? To me it looks like points are most densely clustered around y=0.
- How does the graph show "there exist entire genres of products that are preferred almost entirely by experts or by beginners: beginners give higher ratings to almost all lagers, while experts give higher ratings to almost all strong ales"? We don't actually know who the experts or novices are (just their difference) so how can we say "beginners give higher ratings to lager" based on the graph?
I have a great deal of difficulty reading graphs in academic papers.
I could be mistaken, but from what I gather:
It appears that how they're doing it is expert score minus novice score. So if an expert gives a 1 and novice gives a 4, then the "difference" is $1-4 = -3$.
The vertical spread signifies variations. As you go up, this signifies that the experts favored a particular item. As you go down, it shows that the novices preferred a particular item.
In combination with (2.), the color clustering shows the the experts favored strong ales and novices favored lagers. Thus the difference is positive and largest in the score. In the other direction, the graph shows that the difference was negative, meaning experts rated the lagers less favorably than the novice. So, even though on average the lagers received fewer stars, the novices did not give the lager as few stars as the experts. The novices gave more stars.