Formula to work out likely number of unique items

32 Views Asked by At

There is an unknown quantity of unique prizes. Each time you play the game, you get a prize. The likelihood of getting any type of prize X is equal to getting the likelihood of any other type, and winning a prize does not decrease the likelihood that that type will be won in future.

I won 14 prizes. Where '1' represents a type of prize I have never won before, and '0' represents winning a type of prize I have won before, I got the following ordered list:

(1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0)

I know that each time I play the game, the probability of getting a prize that I have not won before is (N-a)/N where a is the number of unique prizes I have already won.

What formula can I use to determine what the most likely number of prizes is?

1

There are 1 best solutions below

2
On

There is more than one possible approach.

One is to say that if there are $n$ types of prizes then the probability of the pattern you have seen (i.e. the likelihood) would have been $\frac{n}{n}\frac{n-1}{n}\frac{n-2}{n}\frac{n-3}{n}\frac{n-4}{n}\frac{n-5}{n}\frac{n-6}{n}\frac{n-7}{n}\frac{8}{n}\frac{8}{n}\frac{n-8}{n}\frac{n-9}{n}\frac{10}{n}\frac{10}{n}$. If you try to find the $n$ which maximises this, you will find the maximum likelihood estimate of $n$. I think it may give $\hat n \approx 17.9741$ or $18$ if you restrict $n$ to an integer

An alternative might be to say that if there are $n$ types, then the expected number won after $14$ prizes is $n\left(1 - \left(1-\frac1n\right)^{14}\right)$ and if you set this equal to $10$ then you would get the method of moments estimate which I think is $\hat n \approx 18.4865$

A third method might be to assign a (possibly impropor) prior distribution to $n$, use the likelihood to update this to a posterior distribution for $n$, and then chose one of the central statistics of the posterior distribution such as the mean, median or mode as your estimator.

In any case, $18$ looks a plausible estimate, but it is quite possible that $n$ is some other value and the third method would give you a reasonable sense of that uncertainty.