Estimate the number of sides of the die

169 Views Asked by At

Suppose you have a fair die with an unknown number of sides, $T$. Each side is distinct, marked with a complex symbol. Given enough rolls, can you estimate $T$?

My intuition is that can do this via a generalization of the mark and recapture procedure. Roll the die some large number of times, keeping track of all the sides that you have rolled and the number of times you have rolled them. Since the probability of rolling a previously rolled side in $N$ rolls is completely determined by $N$ and $T$, then expected proportion of unique to total sides $p$ should also be determined by $N$ and $T$. What is the relationship between $p$, $N$, and $T$?

As pointed out by Don Thousand, you can save yourself a lot of overhead and just track one arbitrarily picked side. Since the probability of rolling the chosen side on any one roll is $T^{-1}$, you expect that you would see the side repeated $NT^{-1}$ times. You can use standard statistical techniques to determine how precise this estimate likely is. If you have reasonably strong priors about the likely size of the population, within an order of magnitude or so, you can probably figure out how big of an $N$ you need to satisfy your requirements.

However, this procedure is inefficient, since we will likely have re-rolls of sides other than the one we have picked. The frequency of those re-rolls is almost equally as informative. How can we combine all the information we have to arrive at an estimate for $T$?

1

There are 1 best solutions below

0
On

Here is one way you could do it. Let $D_n$ be the number of distinct symbols observed after seeing $n$ rolls. Let $S$ be the number of symbols, and I'm assuming that we have prior distribution for $\mathbb{P}(S = s) = \pi(s)$ before we observe any dice rolls.

This answer shows that if $d \leq s$, $$\mathbb{P}(D_n = d \mid S = s) = \frac{d!\binom{s}{d} \def\stir#1#2{\left\{{#1\atop#2}\right\}}\stir nd}{s^n},$$

where $\stir nd$ is a Stirling number of the second kind. Using Bayes' Theorem we can compute $$\mathbb{P}(S = s \mid D_n = d) = \frac{\mathbb{P}(D_n = d \mid S = s) \mathbb{P}(S = s)}{\sum_s \mathbb{P}(D_n = d \mid S = s) \mathbb{P}(S = s)}.$$

To give an example, suppose I know that $S$ is in the set $\{6, 7, 8, 9, 10\}$ and that my prior is $\pi(6) = \pi(10) = 0.1$, $\pi(7) = \pi(9) = 0.2$, and so $\pi(8) = 0.4$. Now suppose we observe the following 15 rolls:

K K Q H X G A X X Q C C C G X

We have $D_{15} = 7$ because there are 7 distinct observed symbols: $\{A, C, G, H, K, Q, X\}$. We now have $\mathbb{P}(S = 6 \mid D_{15} = 7) = 0$ because we know there are at least 7 symbols. We now calculate the likelihood $\times$ prior terms:

  • $\mathbb{P}(D_{15} = 7 \mid S = 6) \mathbb{P}(S = 6) = 0$,
  • $\mathbb{P}(D_{15} = 7 \mid S = 7) \mathbb{P}(S = 7) = 0.433\ \cdot \ 0.2 = 0.087$,
  • $\mathbb{P}(D_{15} = 7 \mid S = 8) \mathbb{P}(S = 8) = 0.468\ \cdot \ 0.4 = 0.187$,
  • $\mathbb{P}(D_{15} = 7 \mid S = 9) \mathbb{P}(S = 9) = 0.360\ \cdot \ 0.2 = 0.072$,
  • $\mathbb{P}(D_{15} = 7 \mid S = 10) \mathbb{P}(S = 10) = 0.247\ \cdot \ 0.1 = 0.025$.

Then the posterior probabilities are

  • $\mathbb{P}(S = 6 \mid D_{15} = 7) = 0$,
  • $\mathbb{P}(S = 7 \mid D_{15} = 7) = 0.087/0.371 = 0.23$,
  • $\mathbb{P}(S = 8 \mid D_{15} = 7) = 0.187/0.371 = 0.50$,
  • $\mathbb{P}(S = 9 \mid D_{15} = 7) = 0.072/0.371 = 0.19$,
  • $\mathbb{P}(S = 10 \mid D_{15} = 7) = 0.025/0.371 = 0.07$.

So we are now even more convinced that $S=8$ than before (and I did actually use $S=8$ to generate the sequence of rolls).

You could also look into using Approximate Bayesian Computation (ABC) to solve this problem. There is a really nice example application of ABC here.

A related problem that you might find interesting if you haven't seen it before is the coupon collector's problem.