Population containing 1000 citizens with following characteristics:
Age: <= 65 , 65+
Smoking 40 210
Non-Smoking 600 150
Question The prevalence of Smoking in the population is p.smoke = 0.25. This prevalence is unknown to some researchers who want to determine this. Therefore, they sample 20 citizens and use the sample proportion as an estimate for p.smoke.
However as a result of the modern techniques used (a small questionnaire), the elderly are less likely to respond. As a result every person that is selected is older than 65 with probability 0.3 and younger with probability 0.7. The bias of the estimate obtained with this sampling strategy equals:
a) 0
b) 0.2166667
c) -0.03125
d) -0.21875
How do I tackle this question???
Assuming that 65+ smoke the same as 65–, then it is no difference how old are they. Then u only need to count the bias. Of course you can estimate standard error, but that's not bias. Bias is the systematic error that you commit all the time. For instance you everytime measure ceil of some continuous value.
So bias is, from definition, is calculated as follows $$ \operatorname{bias} \hat\theta = \mathbb{E} \hat\theta = \mathbb{E} \overline X = \mathbb{E} \frac1n \sum_{i} X_i = \frac1n \sum_i \mathbb{E} X_i = \frac1n \cdot n \cdot p = p, $$ where $p$ is the true parameter. So your expected value is the true value. Think of it this way: take 1 citizen. Of course you always get either 0% or 100%. But you don't do systematic mistake -- if you take all citizens, average of little experiments will give the true ratio smokers/overall.