Bad example of parameter estimation?

57 Views Asked by At

Our professor gave us the following example and said that this is a typical one when it comes to parameter estimation.

A wholesaler receives a shipment of $40,000$ bananas and wants to estimate the portion of spoiled bananas, given by $m$. So he takes a sample of $100$ bananas and counts $5$ spoiled bananas. He decides to use $\frac{5}{100}$ as an estimate for the total portion of spoiled bananas. Is this a good estimate?


Taking a sample of $100$ bananas and counting the spoiled ones can be represented by a hypergeometric distributed random variable $X$, so the expected value of the estimator is simply $$ \mathbb{E}\left(\frac{X}{100}\right)= m. $$ So the estimator is unbiased.

However, this example doesn't make a lot of sense to me. Because usually when performing any estimation we should have a larger sample and here the wholesaler only takes one sample, namely the $100$ bananas. So in my opinion the wholesaler should take more than one sample (of course independent ones) to get an appropriate estimator.

What do you think? Is this just a bad example?

1

There are 1 best solutions below

0
On

There is nothing explicitly wrong with anything described in the example. There are plenty of missing details which could have an impact - for example, the mechanism by which the samples are taken - but as long as we assume that the 100 bananas sampled were not in any way "special" compared to the rest (i.e. any of the 40000 bananas could have been in the sample of 100, presumably with equal probability), then the proportion of sampled bananas that are rotten is indeed an unbiased estimator for the proportion of bananas in the whole shipment.

Saying that they should take multiple samples is not really right, because they did take multiple samples - 100, in fact. Again, it's not clear how those 100 were chosen, but there's nothing in the wording that explicitly suggests that they just took the first 100 bananas they saw, or picked a single crate of 100 bananas, or anything else.

As to whether it's a "good" estimator, in the absence of any additional information it's probably the best possible estimator available for the chosen sample because you have no other information you could use to improve it. If you know how the sample was selected you can determine the variance of the estimator, and if you know something about the process by which bananas rot you can potentially argue about which sampling method will give better results, but that's all outside the scope of the question.