Sampling Distribution Disturbing Answer

3.3k Views Asked by At

I am currently studying about Sampling distribution of Sample means, and came across below example here.

Question:
The average male drinks 2L of water when active outdoors with a standard deviation of .7L. You're planning a full day nature trip for 50 men and will bring 110L of water. What's the probability you'll run out?

Given Answer:(transcript taken from here which is same as khan's)
The probability of running out of water is the probability of using more than 110L of water. This is the same as the probability of the average water use is greater than 2.2L (110L divided by 50 men) per man

P(average water use > 2.2L per man)

$\mu_\bar{x}$ = $\mu$ = 2L
$\sigma_\bar{x}^2 = \dfrac {\sigma^2}{n}$
$\sigma_\bar{x} = \dfrac {\sigma}{\sqrt{n}} = \dfrac {0.7}{\sqrt{50}} = 0.099$

We just need to figure out how many standard deviations 2.2L is away from the mean (known as the z-score)

$\dfrac {2.2-\mu}{\sigma} = \dfrac {2.2-2}{0.099} = 2.02$

The probability that average water us > 2.2L per man is the same as probability that the sample mean will be more than 2.02 standard deviations above the mean. Now you can use a z-table to figure out that probability.

0.9783 is the probability that we're less than 2.02 standard deviations above the mean

P(running out of water) = 1 - .9783 = .0217

My questions:

  1. Broadly, what is the inference getting a 2.17%, how is different from 1% or 3% or even 5% practically? What realistic action or usefulness there could be out of this inference?

  2. Average male indicates huge or even entire population of male. And sampling distribution with just 50 men isn't too small to consider as a normal distribution? (given that, we do not know about population distribution in question. Unless one assumes that also as normal)

  3. Even if 50 is normal, it is just 1 sample (of size 50 men). Shouldn't we get a normal distribution only when we repeat this N-trials or N-number of times, to have the normal distribution effect to take place?

  4. How is it not Sampling distribution of sample proportion?

  5. Isn't it counter intuitive that our sampling distribution has lower SD ( so higher certainty), makes one wonder if sampling distribution is better than population distribution? How latter is more beneficial?

Kindly clarify.

3

There are 3 best solutions below

0
On

The main problem with this question, I think, is that we don't know that your $50$ men are a random sample from the population of all males, or how the activity they will be doing will compare to whatever those average males were doing. If the weather is like what it is in much of North America and Europe these days, they'll need a lot more water!

5
On

My comments are from a practical aspect of this analysis.

  1. The $2.17\%$ probability is a metric used to compare against a predetermined risk factor. In this particular case, the consequences of running out of water would have to be assessed for an acceptable level of risk. Is someone going to die or become dehydrated or just a little thirsty? Normally a predetermined risk factor would be $.05$. If lives are at stake then $.01$ or even smaller may be more appropriate.

  2. Ask yourself if the $50$ men on a nature trip are a typical cross section of the male population. They probably weren't a random sample so are likely to be biased or non representative in some way. If they are older or different in some way that would effect their hydration then this needs to be taken into account.

  3. You are mistaking or taking this sample as a simple random sample which it probably isn't. Even so, what aspects of normal are you looking at to decide if the sample is normal? If the population is normal, and the sample is random and large enough, then the criteria for a valid test have been met.

  4. Please Clarify this point.

  5. No, it is a recognized phenomenon that generally a random sample will have a smaller SD than the population it came from. Hence the calculation of sample SD versus population SD by division by $n-1$ versus $n$ to correct for this anomaly. The goal is to obtain a valid and accurate test by applying criteria that have been determined to work best for a majority of cases. It isn't an exact method and is prone to error. However, in a well designed test, measures can be taken to limit these errors as much as possible

0
On
  1. I guess risk can be in decimals as stated by phil
  2. I guess it could be but for the purpose of solving and understanding the problem you have to assume that it is a sufficiently large sample size otherwise there is no point in asking the question if the resulting sample distribution is not normal

3. Even if 50 is normal, it is just 1 sample (of size 50 men). Shouldn't we get a normal distribution only when we repeat this N-trials or N-number of times, to have the normal distribution effect take place?

I had the same question pop up in my mind for a while but then it came to me that Yes, he is assuming that we have taken N-trials because that is how the sampling distribution of the sample mean is constructed, but that does not change the sample size.The sample remains 50 no matter how many trials you take. so we are able to calculate. $$\large\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}$$

4. How is it not Sampling distribution of sample proportion? I guess my 3rd answer answers your 4th question as well. It is a Sampling distribution of sample mean. what makes you think it is not?

5. Isn't it counter intuitive that our sampling distribution has lower SD ( so higher certainty), makes one wonder if sampling distribution is better than population distribution? How latter is more beneficial?

variance is based on distance away from the mean now the original distribution could have a range between say 1 and 10,000 but when you take any sample mean of size 50 it will always lie somewhere between 1 and 10,000 . So when a Sampling distribution of sample mean is constructed it will always have values lying between 1 and 10,000 so it will always be squeezed ( larger the sample size larger the squeeze will be ) than the original distribution and hence less distance away from the mean. So the resulting SD will always be less for such a sample. KhanAcademy : practical usage of this can be something like this