Online Test Scenario:
A random sample of $500$ is taken from a large population, which is known to be equally divided between males and females, and values for the quantity of interest are recorded. On examination of the results, it is found that the sample taken includes $200$ females for which the mean is $10.2$ with standard deviation of $0.6$ and $300$ males for which the mean is $14.8$ and standard deviation $2.4$.
Question:
Which one of the following statements in NOT correct?
Taking the mean value of the data for the $500$ sampled would over-estimate the true population mean
The most accurate estimate of the mean of the quantity of interest would have been obtained by sampling equal numbers of males and females
Given the sample that was taken, the best estimate of the population mean is the average of the means of the males and females i.e. $12.5$
The estimate from the sample of the mean for females is likely to be more accurate than that for males
My Attempt:
My guess is that option $4)$ is incorrect because your told in the beginning the population is known to be divided equally?
Statement #4 is likely true.
If the sample is indeed random, then the $200$ females constitute a random sample of the females in the population. Similarly, the $300$ males constitute a random sample of the males in the population.
A rough estimate of the how far a sample mean is from a population mean is the sample’s standard error. Here, the standard error for females is ${0.6\over\sqrt{200}}\approx0.04$ and for males is ${2.4\over\sqrt{300}}\approx 0.14$. It’s reasonable to assume then that the expected inaccuracy of the female sample mean (as an estimate of the female population mean) is about ${0.04\over0.14}\approx 0.29$ of the expected inaccuracy of the male sample mean as an estimate of the male population mean.
Statement #3 is likely true.
The best estimate we have from the sample is that the males population average is $14.8$ and the female population average is $10.2$. Based on these best estimates, and using the given that the population is half female and half male, one would expect the population average across males and females to be $10.2+14.8\over2$. (This is different from the sample average, which would be a weighted average of the two means.)
Statement #1 is likely true. The particular sample happens to have more than the expected number of males, and males appear to have a higher value of the quantity of interest, so the sample mean is likely an over-estimate.
Statement #2 is likely false. Assuming, as appears likely from the sample, that males’ values of the quantity of interest are more spread out, a better estimate would be found by sampling more males. Consider this extreme scenario: A quantity of interest is constant and equal to $1$ for females in a population, but it is spread out between $0$ and $2$ for males, with an unknown mean. To estimate the sample mean, it would not be useful to sample more females than needed to suspect that the female average was constant.
[While not part of the question, it’s worth noting that a random sample with the given male-female split is very unusual. Within the distribution of all random samples of size 500 from a population with equally many males as females, the $z$-score of a sample with $200$ females (instead of the expected $250$) is about $4.47$. Only $0.08$% of samples, or one in about $1250$, would have such an unbalanced distribution of males to females, if the population were indeed equally divided. This would make me question the premises of the question!]