Imagine that I repeat a random experiment in which I pick up someone in the population and ask his/her favorite movie : F
There exists about 300 000 movies but obviously F is not equi-probabilistic. Some movies have much more fans than others.
Now, imagine that I reproduce the random experiment N=1000 times and gather some $f_i$ ($1 \le i \le N$). How could I estimate the number of movies that would represent R=80% of the weight of favorite movies ?
My ideas to solve the problem :
An important step would be to assess the family of distribution that F belongs to. Then to find the parameters of that distribution.
Intuitively, I would say that a geometric distribution is a good candidate. Well, geometric distribution is over an integer value while, in my case, it is over movies. But If I rank the movies from the most popular to the less popular, I can conceive a geometric distribution over the rank. You can also point that a geometric distribution has no limit while, in my case, the number of movies is finite but it is great. So geometric distribution could be a good approximation.
I have found how to find a max likelihood estimator of the parameter p of the geometric distribution. (unbiased estimator for geometric distribution)
So, my idea would be :
ordering the movies in the samples set from the one that appears the more often in the $f_i$ to the one that appears the less.
estimate the parameter p with $p=\dfrac{n}{n+\sum_{i=1}^n(X_i-1)}$ (where $X_i$ is the rank of $f_i$).
use the geometric distribution CDF : $1-(1-p)^k$ to get the k that represent s 0.8
What do you think of my idea ?