(multiple choice) Given a random sample of equally divided possibilities what is the true population mean

269 Views Asked by At

Online Test Scenario:

A random sample of $500$ is taken from a large population, which is known to be equally divided between males and females, and values for the quantity of interest are recorded. On examination of the results, it is found that the sample taken includes $200$ females for which the mean is $10.2$ with standard deviation of $0.6$ and $300$ males for which the mean is $14.8$ and standard deviation $2.4$.

Question:

Which one of the following statements in NOT correct?

  1. Taking the mean value of the data for the $500$ sampled would over-estimate the true population mean

  2. The most accurate estimate of the mean of the quantity of interest would have been obtained by sampling equal numbers of males and females

  3. Given the sample that was taken, the best estimate of the population mean is the average of the means of the males and females i.e. $12.5$

  4. The estimate from the sample of the mean for females is likely to be more accurate than that for males

My Attempt:

My guess is that option $4)$ is incorrect because your told in the beginning the population is known to be divided equally?

2

There are 2 best solutions below

5
On BEST ANSWER

Statement #4 is likely true.

If the sample is indeed random, then the $200$ females constitute a random sample of the females in the population. Similarly, the $300$ males constitute a random sample of the males in the population.

A rough estimate of the how far a sample mean is from a population mean is the sample’s standard error. Here, the standard error for females is ${0.6\over\sqrt{200}}\approx0.04$ and for males is ${2.4\over\sqrt{300}}\approx 0.14$. It’s reasonable to assume then that the expected inaccuracy of the female sample mean (as an estimate of the female population mean) is about ${0.04\over0.14}\approx 0.29$ of the expected inaccuracy of the male sample mean as an estimate of the male population mean.

Statement #3 is likely true.

The best estimate we have from the sample is that the males population average is $14.8$ and the female population average is $10.2$. Based on these best estimates, and using the given that the population is half female and half male, one would expect the population average across males and females to be $10.2+14.8\over2$. (This is different from the sample average, which would be a weighted average of the two means.)

Statement #1 is likely true. The particular sample happens to have more than the expected number of males, and males appear to have a higher value of the quantity of interest, so the sample mean is likely an over-estimate.

Statement #2 is likely false. Assuming, as appears likely from the sample, that males’ values of the quantity of interest are more spread out, a better estimate would be found by sampling more males. Consider this extreme scenario: A quantity of interest is constant and equal to $1$ for females in a population, but it is spread out between $0$ and $2$ for males, with an unknown mean. To estimate the sample mean, it would not be useful to sample more females than needed to suspect that the female average was constant.

[While not part of the question, it’s worth noting that a random sample with the given male-female split is very unusual. Within the distribution of all random samples of size 500 from a population with equally many males as females, the $z$-score of a sample with $200$ females (instead of the expected $250$) is about $4.47$. Only $0.08$% of samples, or one in about $1250$, would have such an unbalanced distribution of males to females, if the population were indeed equally divided. This would make me question the premises of the question!]

8
On

Let $M$ be the mean of the males. Let $F$ be the mean of the females.

$1$) is correct because taking the mean value of the sample would mean the $M$ is weighted more than $F$ ($60$%-$40$%).

$2$) is kinda correct, though really as long as you take the average of $M$ and $F$, not of the entire sample group, you don't need to get equal samples of males and females.

$3$) is correct because it reiterates the idea of averaging $M$ and $F$, and not taking the mean of the entire sample itself.

$4$) [Credit to SteveKass] is correct because the larger standard deviation associated with the sample of males offsets the larger sample size, so there is less accuracy in $M$.

So in conclusion, $2$ is iffy, because since we're averaging $M$ and $F$, sample size isn't as important.