Estimating true average from sampled average with Inspection Paradox

294 Views Asked by At

I'm facing the following problem (situation simplified). I'm sampling a group of students and I need to find out the number of families having 2 kids (or a probability). Assume

  1. Siblings are all part of the group I'm sampling from
  2. Siblings can be either 0 or 1 for simplicity

I run the survey and get back results in the following way

  1. 0 sibling, frequency = 4
  2. 1 sibling, frequency = 9

I arrive at a biased estimate of the probability that a given family has 2 kids as (0*4 + 1*9)/13 = 0.69. This is biased because I'm more likely to sample someone who has 1 sibling given all siblings are in the same group. If my overall population is about a 100, what is a way to estimate the true probability given this data? Is it even possible? Do I need more information?

To see the bias, assume a simple case where there are 3 kids. 2 siblings and 1 single which gives 2 families one of them with 2 kids. The survey method above would yield $\frac{2}{3}$ probability when in reality it should be $\frac{1}{2}$.

1

There are 1 best solutions below

2
On

Your sampling is fine as is your estimate of the mean. You might have a population of $32$ only children and $72$ pairs of siblings. The average number of siblings is exactly as you calculated it. That doesn't mean that only one third of the families have only one child. Your population would then represent $32$ families with one child and $36$ families with two children, so (a little over) half the families have two children. Of course, your sample is quite small, so the variance on your estimate is quite high, but that doesn't change the point of the question.