Maximum-Likelihood-estimator for number of marbles

91 Views Asked by At

Let there be n marbels in a box. Each has a unique number from 1 to n written on it. You pick a marble, write down the number and put it back. This process is repeated N times. Give a maximum-likelihood-estimator T for n and its relative expected value $E[T]/n$ for big n.

The first part is pretty straight forward. Each pick $X_i$ is a uniformly distributed random variable on $\{1, ..., n\}$, thus we get $\frac{1}{n^N}\Pi_{i=1}^N\chi_{1, ..., n}(x_i)$ as the likelihood function. This gives us $T(x_1, ..., x_N)=\max x_i$.

For the expected value I get $E[T] = \sum_{k=1}^n P(\max X_i = k)\cdot k = \frac{1}{n^N}(\sum_{k=1}^n k(k^N-(k-1)^N)$ where I used $P(\max X_i = k) = P(\max X_i \le k) - P(\max X_i \le k-1)$. Thus in the limit $\lim_{n\to\infty} E[T]/n = 1$. Intuitively this feels wrong though. How can the maximum of a fixed amount of picks stay in the order of $n$ if I keep increasing n? Did I get the wrong result?

1

There are 1 best solutions below

1
On BEST ANSWER

Let's start from the beginning. The joint probability of the sample $\boldsymbol x = (x_1, \ldots, x_N)$ where each $$x_i \sim \operatorname{DiscreteUniform}(1,n)$$ is IID, is given by $$f(\boldsymbol x \mid n) = \prod_{i=1}^N f(x_i \mid n) = n^{-N} \mathbb 1(1 \le x_{(1)} \le x_{(N)} \le n).$$ Consequently, the joint likelihood is proportional to $$\mathcal L(n \mid \boldsymbol x) \propto f(\boldsymbol x \mid n),$$ and this likelihood is maximized for the parameter $n$ and a fixed $N$ when $n$ is made as small as possible; i.e., $n = x_{(N)}$. Thus $$\hat n = T(\boldsymbol x) = x_{(N)} = \max_i x_i,$$ the maximum order statistic. So far, our solutions agree. Now, the distribution of $\hat n$ is given by $$\Pr[x_{(N)} = k] = \Pr[x_{(N)} \le k] - \Pr[x_{(N)} \le k-1],$$ where $$\Pr[x_{(N)} \le k] = \Pr\left[\bigcap_{i=1}^N (x_i \le k)\right] = \prod_{i=1}^N \Pr[x_i \le k] = \left(\frac{k}{n}\right)^N, \quad k \in \{1, \ldots, n\}.$$ Hence $$\Pr[x_{(N)} \le k] = \left(\frac{k}{n}\right)^N - \left(\frac{k-1}{n}\right)^N = \frac{k^N - (k-1)^N}{n^N}.$$ The expectation is $$\operatorname{E}[\hat n] = \sum_{k=1}^n k \left(\frac{k^N - (k-1)^N}{n^N}\right).$$ So far, we agree. However, the calculation of this sum is not trivial. First, it partially telescopes: $$\operatorname{E}[\hat n] = n - \sum_{k=1}^{n-1} \frac{k^N}{n^N}.$$ You can convince yourself of this by writing out the first few terms, and seeing how they cancel (a formal proof is possible but I leave it as an exercise). Second, dividing by $n$ yields $$\frac{\operatorname{E}[\hat n]}{n} = 1 - \frac{1}{n} \sum_{k=1}^{n-1} \left(\frac{k}{n}\right)^N.$$ We are now interested in the limit of this sum; namely, $$g(N) = \lim_{n \to \infty} \frac{1}{n} \sum_{k=0}^{n-1} \left(\frac{k}{n}\right)^N.$$ Note that I added the $k = 0$ term, which does not affect the value of the sum since this term equals zero for positive integers $N$. Then we recognize $g$ as a Riemann sum $$\int_{x=a}^b h(x) \, dx = \lim_{n \to \infty} \frac{b-a}{n} \sum_{k=0}^{n-1} h\left(a + \frac{b-a}{n} k\right),$$ for the choice $a = 0$, $b = 1$, $h(x) = x^N$. Therefore, $$g(N) = \int_{x=0}^1 x^N \, dx = \frac{1}{N+1},$$ and $$\lim_{n \to \infty} \frac{\operatorname{E}[\hat n]}{n} = 1 - \frac{1}{N+1} = \frac{N}{N+1}.$$ This expectation is clearly less than $1$ and is a function of the number of draws made.