I like math but I also like movies. I have been collecting movies all my life. My collection is rather huge: almost 25.000 movies. Being also a developer I was able to create my own online catalogue and pull various statistics from the database. There is one thing that puzzles me.
Movies have ratings and I did not invent mine: I have copied them from IMDb. As you probably already know, IMDb ratings go from 1 to 10, with 1 being the lowest. I have created a histogram representing ratings distribution and it looks like this:
I expected to see something like normal distribution, but my histogram has a funny dip around rating 7.0.
Is this a known phenomenon in statistics?
Has anyone seen something like this in other data?


You can get the full IMDB dataset (updated daily) from here !
On it (as of 27/06/2023) are 293,501 rated films. The distribution of their rating is shown below:
As you can see, the full dataset doesn't show the same bimodal distribution as the curated sample in the question.
This suggests that the sampling is producing this bimodality. There are lots of possible reasons for this but perhaps the datasets will let you explore a bit more.
Many of the films have 500 votes or fewer. If we discount those, we're left with around 58k films whose ratings distribution is below:
One striking fact about these charts is quite how high the ratings are. It seems a rating of 5 does not correspond to an "average" film. Perhaps you get a few ratings points for making a film at all ;-).