I have this problem I'm stuck with. I would really appreciate some help.
"The following is a sample from a normal distribution.
$7.6 , 9.6, 10.4, 10.7, 11.9, 14.1, 14.6, 18.5$
a) Let $X$ have this normal distribution and let $p=P(X\geqq 20)$. If we estimate $p$ by the relative frequency, we just get 0. Suggest another estimate.
b)Let $x$ be the 95th percentil, i.e. a value such that $P(X \leqq x) > = 0.95$. Suggest an estimate for $x$."
This seems fairly straight forward. I dont get the same answer as my text book however. I'm thinking since $X\sim N(\mu,\sigma ^2)$ we can estimate $\mu = \bar{X} = 12.175$, and $\sigma ^2 = s^2 =\frac{1}{n-1}\sum_{k=1}^{n}(X_k-\bar{X})^2 \approx 11.79 \implies s \approx 3.43$. So far so good.
Now $p = P(X\geqq 20) = 1- P(\frac{X-\bar{X}}{s}\leqq\frac{20-\bar{X}}{s})$ where $\frac{X-\bar{X}}{s} \sim F_{t_6}(0,1)$, i.e. a T-distribution of $n-2 = 8-2 = 6$ degrees of freedom. (This is where I'm uncertain whether this is correct).
This leads to $p = 1-F_{t_6}(2.28)$ which gives the wrong answer according to the book. In the same way does the (b)-part go wrong as well. The main question is how to use the T-distribution correctly. Is the T-distribution even the right choice in this problem?
I would very much appreciate some help because my text book dont explain this types of problems very well.
Cheers!
Estimating a probability. My guess is that you are supposed to estimate $\mu$ as $\hat\mu = \bar X = 12.175$ and $\sigma$ as $\hat \sigma = 3.434.$ Then use the estimated distribution $X \sim \mathsf{Norm}(\hat \mu, \hat\sigma)$ to estimate $P(X \ge 20) \approx 0.01135.$ In R statistical software (where
pnormis a normal CDF), it looks like this:Here is a look at the quantile method suggested in part (b). According to R, the 95th percentile of your data is about 17.1. (Various texts and software programs have various rules for finding quantiles of small datasets. The 'percentile rule' in your book may give a somewhat different answer--presumably somewhere between 14.6 and 18.5.).
If we look for the 95th percentile of the estimated normal distribution, $\mathsf{Norm}(\mu=12.175,\sigma=3.434)$ from above, the answer is not a lot different: about 17.82. (Percentiles of continuous distributions are precisely defined, so there is no quibbling here, except for rounding.)
Having shown you this much, the following may be beyond the point. But it was reasonable for you to think of using a t distribution for something in part (a), and I want to finish that part of the story. Read as much as interests you.
Also, I have to say that using only $n = 8$ observations in this way to 'find' $P(X \ge 20)$ from a normal distribution with unknown $\mu$ and $\sigma$ is simply not a workable idea in practice. As you will see from the confidence intervals (CIs) below, there is a lot of room for random error in estimating $\mu$ and $\sigma$ from only $n=8$ observations.
Estimating the population mean: The t distribution would be appropriate if you were finding a confidence interval based on $\bar X$ for the normal population mean $\mu$ or testing a null hypothesis about $\mu.$
Specifically, the
t.testprocedure in R uses Student's t distribution with $n - 1 = 8 - 1 = 7$ degrees of freedom to find the 95% CI $(9.30, 15.05).$The CI does not contain 0, and the null hypothesis $H_0: \mu = 0$ is overwhelmingly rejected because of the tiny p-value.
Estimating the population variance and SD: You could also get a 95% CI for $\sigma^2,$ using the chi-squared distribution: $(n-1)S^2/\sigma^2 \sim \mathsf{Chisq}(7).$ A 95% CI for $\sigma^2$ is $(7S^2/U,\, 7S^2/L),$ where $L$ and $U$ cut 2.5% of the area from the lower and upper tails, respectively, of $\mathsf{Chisq}(7).$ Then take square roots of the endpoints to get a 95% CI for $\sigma,$ which is $(2.27, 6.99).$
The following session from Minitab software (somewhat edited for relevance) gives essentially the same CI for $\sigma:$
Neither Student's t distribution nor the chi-squared distribution is useful for finding $P(X \ge 20).$ Each is relevant for making a confidence interval for one of the parameters $\mu$ and $\sigma.$
Addendum: Everything works better for large samples. Consider the distribution $\mathsf{Norm}(\mu = 12, \sigma = 3.4),$ for which $P(Y \ge 20) = 0.0111.$
If I generate a random sample of size $n = 800$ from this distribution the proportion of observations that are 20 or greater is 0.02, not far from 0.0111.$
If I pretend I don't know $\mu$ and $\sigma,$ I get respective estimates $\bar Y = 12.22$ and $S = 3.53,$ which are reasonably close to the truth.
The estimated normal distribution gives $P(Y \ge 20) \approx 0.013.$
The 95th percentile of the true distribution is $17.76.$
The 95th percentile of the estimated distribution is $18.02.$
The 95th percentile of the points sampled from the true distribution is $18.15.$
A 95% t confidence interval for $\mu$ is $(11.98, 12.47)$ [no output shown, but trust me on this], which closely brackets the true mean $\mu = 12$ of the distribution that produced the data.
And finally, the density function of $\mathsf{Norm}(\mu = 12, \sigma = 3.4),$ is pretty well matched by a histogram of the $n = 800$ sampled observations.
None of these matches have been perfect, but I hope you can see that the ideas of your problem work a lot better for sample size $n = 800$ than for $n = 8.$
Note: I could probably have shown an example with better matches if I had generated a dozen samples of size $n = 800$ and picked the 'best' one. But what you see here is the first sample that came up. Also, I guess it's obvious that an example with $n=80,000$ would have worked better yet.