Minimize the sum of Type I and Type II errors

811 Views Asked by At

Given $X_1,\dots,X_n$ a simple random sample with normal variables ($\mu, \sigma^2$). We assume $\mu$ is unknown but $\sigma$ is known.

Now consider the hypothesis $ \begin{cases} H_0: & \mu=\mu_0 \\ H_1: & \mu=\mu_1 > \mu_0 \end{cases} $

Determine the critical region $R$ in order to minimize the risk $P_{H_0}(R)+P_{H_1}(R^c)$.

I'm not sure how to start this problem, in particular due to the fact that I'm dealing with $n$ samples here. I believe the test statistic I have to apply here is $z=\displaystyle\frac{\bar{X}-\mu}{(\sigma/\sqrt{n})}$, but I'm not sure how the application of it follows.

EDIT

Alright, let's consider the following: using the error function above I have that the error function with mean 0 and variance $\sigma $ is $\frac{1}{2\pi}\int_0^{\alpha/(\sigma\sqrt{2})} e^{-t^2}dt$.

This error gives the probability of falling in $(-\alpha,\alpha)$ but I am interested in the rejection region, this is $(-\infty, \alpha)\cup(\alpha, +\infty)$. Therefore, I think I should consider the complementary error function

$$ \operatorname{erfc}(\alpha) = 1-\frac{1}{2\pi}\int_0^{\alpha/(\sigma\sqrt{2})} e^{-t^2} \, dt = \frac 1 {2\pi}\int_{\alpha/(\sigma\sqrt{2})}^\infty e^{-t^2}\,dt $$

Now I could derive and get that $\frac{d}{dt}\operatorname{erfc}(\sigma) = - \frac{1}{2\pi}e^{-\alpha^2/(2\sigma^2)}$. I should set it to $0$ and find $\alpha$, to "solve" the problem.

There are three issues here: (1) $e^{-\alpha^2/(2\sigma^2)}$ will never be zero for any $\alpha$.

(2) I didn't get involved the hypothesis testing.

(3) It is not clear what the $\sigma$ in the error function is. The wikipedia entry linked above says that error generally have mean zero, but it is possible for the error to have a variance. Is the $\sigma$ in the normal distribution the very same $\sigma$ in the error function?

1

There are 1 best solutions below

3
On BEST ANSWER

When the population SD $\sigma$ is unknown, and hence estimated by the population SD $S,$ then the appropriate test statistic is $T = \frac{\bar X - \mu_0}{s/\sqrt{n}}.$ Under the null hypothesis $H_0: \mu = \mu_0,$ the test statistic is distributed as Student's distribution with $n-1$ degrees of freedom.

You can use printed tables or software to find the critical value $t^*.$ Because this is a right-tailed test (against $H_a: \mu > \mu_0$), you would choose $t^*$ that cuts 5% from the upper tail of the distribution, and reject $H_0$ at the 5% level of significance when $T > t^*.$ For example if $n = 16$ then $t^* = 1.731$ is the critical value for a for a test at level 5%.

This is a standard 'one-sample t test'. It assumes that the population is normal or nearly normal. As the sample size $n$ increases the assumption of normality becomes somewhat less important. An alternative test due to Wilcoxon for the population median does not require normal data, but works best if the population distribution is roughly symmetrical and when there are no tied values in the data. This nonparametric rank-based test is called the 'Wilcoxon signed-rank test'. If neither of these tests is appropriate, then one of several kinds of 'permutation test' might be used.

If you are in doubt whether the one-sample t test is appropriate for your data, then perhaps you can post your data or a histogram of it, and one of us can help you decide.

Your use of "$P_{H_0}(R)+P_{H_1}(R^c)$" (the sum of type one and type two errors) is premature. First, we need to make sure what kind of test is to be used. In practice, even if we know it is a one-sample t-test, that would require a guess at $\sigma$ and a knowledge of both the sample size and the difference $\mu_0 - \mu_1$. Not to be snarky, but an infinite sample size would drive both error probabilities to 0.

Terminology in 'single quotes' can be found in most basic statistics texts, and some online accounts are authoritative.