Estimating the total attendance

702 Views Asked by At

Suppose you do not know how many people are attending a convention, but you do know that as each person entered he was given an identification tag with a number on it. The tags are numbered serially from 1 to N, where N is the unknown number in attendance. You select a random sample of ten people, let us say, and observe that the largest number on their badges is 261. What estimate do you then make of the total attendance at the convention? What is the probability that the total number of people lies in the interval $[261,1000]$?

1

There are 1 best solutions below

3
On

This is a Bayesian estimation question and therefore has no single correct answer, since your conclusion depends on your choice of prior. Here's the cleanest choice. Each ticket you randomly sample will follow a uniform distribution over the interval $[1,N]$ for some unknown parameter $N$. The conjugate prior of a uniform distribution is a Pareto distribution, so we'll start with the (improper, uninformative) prior of a Pareto distribution with $x_m=0$ and $\alpha=1$. After sampling $10$ tickets with a largest result of $261$, our new parameters are $x_m=261$ and $\alpha=11$. You can then obtain your desired estimate of the attendance by computing the mean of a Pareto distribution with those parameters, and the likelihood of the attendance being $\leq 1000$ by computing the CDF at $1000$.

Mind you that we're fudging slightly by using continuous distributions when in fact we know that the attendance has to be an integer, but the error introduced is very small.