Exercise - Student t distribution / Normal distribution - determine the proportion of samples exceeding some value

29 Views Asked by At

Is my reasoning on the following exercise correct? I am not sure if I correctly used Student t distribution instead of normal distribution and whether my calculation of std makes sense.

Exercise: "Four samples are taken and averaged every hour from a production line and the hourly measurement of a particular impurity is recorded. Approximately one out of six of these averages exceeds 1.5% when the mean value is approximately 1.4%. State assumptions that would enable you to determine the proportion of the individual readings exceeding 1.6%. Make the assumptions and do the calculations. Are these assumptions likely to be true? If not, how might they be violated?"

My attempted solution:

$n=4$, $df=3$
$\mu = 1.4\%$
$P(x>1.5\%) = 1/6$
$P(x>1.6\%) = ?$

Since the sample size is small and we don't know the population std, we need to use the Student-t distribution (with 3 degrees of freedom). For that, we need to assume that the samples are IID. To compute the t-score we need $\sigma$, which is not given, but we can compute it from the information that $P(x>1.5\%) = 1/6 \approx 0.167$. It means that we can compute the value of t-statistic with inverse cdf of $1-0.167=0.833$. Inverse cdf for t-distribution with 3 degrees of freedom for $0.833$ is approximately $1.15$. From that we know: $$1.15 = \frac{x-\mu}{\sigma/\sqrt{n}} = \frac{1.5\%-1.4\%}{\sigma/\sqrt{4}} = \frac{0.1\%}{\sigma/2}$$ hence $\sigma = 0.174\%$.
Now we can compute the t-score for $1.6\%$ which is: $$t = \frac{1.6\%-1.4\%}{0.174\%/2} \approx 2.3$$ Next, reading from t-tables, we can check the value of $P(x>1.6\%)=P(t>2.3) \approx 0.05$, so there is around $5\%$ chance of getting a reading greater than $1.6\%$.

The assumption of IID could be not true if the samples were taken from different sources, or if the samples were not independent.

1

There are 1 best solutions below

2
On

By assuming that

  1. Each measurement follows the normal distribution $N(1.4,\sigma^2)$ (reasonable assumption for many measurements)
  2. The measurements are iid (as they are randomly sampled from the same line randomly at a certain time),
  3. The process is in control (the mean and variance remain unchanged).

we have that the hourly average $\bar{X} \sim N(1.4,\frac{\sigma^2}{4})$. Then, from

$$P(\bar{X}>1.5\%) = P \left (Z=\frac{\bar{X}-1.4}{\sigma/\sqrt{4}}>\frac{1.5-1.4}{\sigma/\sqrt{4}} \right) =1-\Phi\left (\frac{1.5-1.4}{\sigma/\sqrt{4}} \right)=1/6,$$

you can obtain $\sigma^2$, and then you can compute

$$P(\bar{X}>1.6\%).$$

Working with T distribution is not needed here as we can directly compute $\sigma^2$.

If the sample is taken fully randomly at a certain time, in practice, only the first assumption may be violated, and $\bar{X}$ may not follow $N(1.4,\frac{\sigma^2}{4})$ as the sample size is only $4$. For a larger sample size ($\ge 30$), we can use the CLT. Note that for some population distributions (such as uniform distribution) even for $n=4$, the distribution of $\bar{X}$ becomes very similar to a normal distribution.