Estimating the number of sides of a die

248 Views Asked by At

A friend of mine inspired this question:

Given a fair die, can one accurately estimate (within some margin) the size of the die given enough rolls? For example: If I roll a die 1000 times and all the numbers fall on and between 1 and 5, I estimate the size of the die is 5.

My question is: is this an accurate estimate? Will we always be able to predict the size of the die within some margin of error?

Any help is appreciated.

1

There are 1 best solutions below

0
On BEST ANSWER

As Bruce has said in the comments, the most obvious choice for the estimator of the number of sides is the maximum of all the $k$ rolls seen. Let's take that and see what we can learn about it.

Let $x_i$ be the result of the $i$th roll. Then our estimator is $$ \hat{N} \equiv \max_i \; x_i. $$ So the way that the estimator would fail, would be if the largest value of the die doesn't come up in the $k$ rolls. Because the die is assumed to be fair the probability distribution for each value is equal: $$ P(x_i = n) = \frac{1}{N}, \qquad \forall \quad i, \text{and for any } n. $$ The probability that $N$ doesn't get rolled would be $$ P(x_i \ne N) = 1 - P(x_i = N) = 1 - \frac{1}{N}. $$ Over the full course of $k$ rolls, we can calculate the probability of never rolling the largest value $(N)$ as $$ P\left(\bigcap_{i=1}^k (x_i \ne N)\right) = \prod_{i=1}^k P(x_i \ne N) = \left( 1- \frac{1}{N}\right)^k = \left(\frac{N-1}{N}\right)^k. $$ Now say that we want to compare this with some margin of error $\epsilon$. ( For example, an $\epsilon= 0.01$ would represent a 1% possibility that the largest number doesn't show up in our $k$ rolls.) $$ \begin{split} \left(1-\frac{1}{N}\right)^k &\le \epsilon \\ k\;\underbrace{\log\left(1 - \frac{1}{N}\right)}_{<\,0} &\le \log(\epsilon) \\ k &\ge \frac{\log{\epsilon}}{\log\left(1-\frac{1}{N}\right)} \end{split} $$ So, given a margin $\epsilon$ set to any arbitrary value (well, it must be positive and less than 1), and the number of sides of the die, we can tell how many rolls $(k)$ we need to achieve that confidence. A couple of tables below show some values of $k$ given an $N$. $$ \epsilon = 0.01 $$ $$ \begin{array}{l|cccccccccc} N & 2 & 4 & 6& 8& 10 &12 &20 &26 &40 &100\\ \hline k & 7 & 17 &26 &35 &44 &53 &90 &118 &182 &459 \end{array} $$ $$ \epsilon = 0.001 $$ $$ \begin{array}{l|cccccccccc} N & 2 & 4 &6 &8 &10 &12 &20 &26 &40 &100 \\ \hline k& 10 &25 &38 &52 &66 &80 &135 &177 &273 &688 \end{array} $$ Alternatively, we can plug in the number of rolls and see how large a die we can identify with a set margin for error. This can be found by backing out what $N$ can be identified within a margin $\epsilon$ with a given $k$. Working it backwards from above, you get something that looks like $$ N \le \frac{1}{1 - e^{\frac{\log(\epsilon)}{k}}}, $$ which says that for $k =1000$, you'd be able to identify a 216-sided die with a 99% probability, and an 87-sided die with a 99.999% probability.