I have the following question in my interview: Suppose I am interested in the average of residents by apartments. So I went to the street and randomly sample people and ask them how many residents are in their apartments and denote them by $x_1, x_2, ... ,x_n$. Find a good estimator of the average residents.
Obviously this is a biased sample since people whose apartments have more residents are more likely to be sampled too. In other words, we sampled too many from "large" buildings. So a simple average would not be a good estimator. What I proposed is a geometric mean $(\prod x_i)^{1/n}$ which seems to be a better estimator. However, I was subsequently asked to show if this is biased and if this is consistent. I am wondering: 1. is this a biased estimator, 2. is this consistent, 3. if there is any other estimators?
I don't know where you got the idea to use the geometric mean to estimate the number of people in each apartment. Is it a guess or do you have a rationale?
Let's say that the mean number of people in apartments is $\mu$ then naively counting responses would give us $$ \mu=(x_1+2x_2+3x_3+...+kx_k+...)/(x_1+x_2+...) $$ but this misses empty apartments ($x_0$) and overcounts as you describe in the question. If you assume that the probability that you meet someone from an apartment with $k$ people in it is proportional to $k$ then you can correct the bias by dividing the number of responses by $k$ so that the denominator becomes $x_1+x_2/2+x_3/3+...$ and the numerator becomes $x_1+2x_2/2+3x_3/3+...+kx_k/k=x_1+x_2+x_3+...+x_k+...$. Then $$ \mu = {\sum_k x_k\over\sum_k(x_k/k)}. $$ For example if we meet two people who live alone, three people who share with one other person, and one person who shares with four other people, then $$ \mu = 6/(2+3/2+1/5)=60/37\approx1.6 $$ which seems reasonable compared to the naive calculation of $$ (2+6+5)/6=13/6\approx2.2 $$ which gives us that the average apartment contains more than two people, when according to our survey only one apartment out of five or six has more than two people.
Conceptually the uncorrected number gives you the average number of people that a person shares with, whereas the corrected number gives you the average number of people in an apartment.
However, this has a systemic bias that you do not know $x_0$.