A friend of mine inspired this question:
Given a fair die, can one accurately estimate (within some margin) the size of the die given enough rolls? For example: If I roll a die 1000 times and all the numbers fall on and between 1 and 5, I estimate the size of the die is 5.
My question is: is this an accurate estimate? Will we always be able to predict the size of the die within some margin of error?
Any help is appreciated.
As Bruce has said in the comments, the most obvious choice for the estimator of the number of sides is the maximum of all the $k$ rolls seen. Let's take that and see what we can learn about it.
Let $x_i$ be the result of the $i$th roll. Then our estimator is $$ \hat{N} \equiv \max_i \; x_i. $$ So the way that the estimator would fail, would be if the largest value of the die doesn't come up in the $k$ rolls. Because the die is assumed to be fair the probability distribution for each value is equal: $$ P(x_i = n) = \frac{1}{N}, \qquad \forall \quad i, \text{and for any } n. $$ The probability that $N$ doesn't get rolled would be $$ P(x_i \ne N) = 1 - P(x_i = N) = 1 - \frac{1}{N}. $$ Over the full course of $k$ rolls, we can calculate the probability of never rolling the largest value $(N)$ as $$ P\left(\bigcap_{i=1}^k (x_i \ne N)\right) = \prod_{i=1}^k P(x_i \ne N) = \left( 1- \frac{1}{N}\right)^k = \left(\frac{N-1}{N}\right)^k. $$ Now say that we want to compare this with some margin of error $\epsilon$. ( For example, an $\epsilon= 0.01$ would represent a 1% possibility that the largest number doesn't show up in our $k$ rolls.) $$ \begin{split} \left(1-\frac{1}{N}\right)^k &\le \epsilon \\ k\;\underbrace{\log\left(1 - \frac{1}{N}\right)}_{<\,0} &\le \log(\epsilon) \\ k &\ge \frac{\log{\epsilon}}{\log\left(1-\frac{1}{N}\right)} \end{split} $$ So, given a margin $\epsilon$ set to any arbitrary value (well, it must be positive and less than 1), and the number of sides of the die, we can tell how many rolls $(k)$ we need to achieve that confidence. A couple of tables below show some values of $k$ given an $N$. $$ \epsilon = 0.01 $$ $$ \begin{array}{l|cccccccccc} N & 2 & 4 & 6& 8& 10 &12 &20 &26 &40 &100\\ \hline k & 7 & 17 &26 &35 &44 &53 &90 &118 &182 &459 \end{array} $$ $$ \epsilon = 0.001 $$ $$ \begin{array}{l|cccccccccc} N & 2 & 4 &6 &8 &10 &12 &20 &26 &40 &100 \\ \hline k& 10 &25 &38 &52 &66 &80 &135 &177 &273 &688 \end{array} $$ Alternatively, we can plug in the number of rolls and see how large a die we can identify with a set margin for error. This can be found by backing out what $N$ can be identified within a margin $\epsilon$ with a given $k$. Working it backwards from above, you get something that looks like $$ N \le \frac{1}{1 - e^{\frac{\log(\epsilon)}{k}}}, $$ which says that for $k =1000$, you'd be able to identify a 216-sided die with a 99% probability, and an 87-sided die with a 99.999% probability.