Hypothesis testing: normal vs. non-normal

120 Views Asked by At

I have the following hypothesis testing problem:

$$H_0:X=Y,\quad\text{vs.}\quad H_1:X=Y+Z$$

where $Y\sim\mathcal{N}(0,\sigma^2)$ and $Z$ is a random variable with non-normal continuous distribution. I am not very familiar with statistics. Is there a well-known way to solve this problem?

1

There are 1 best solutions below

0
On BEST ANSWER

If data are normally distributed, then points in a a normal probability plot (normal Q-Q plot) tend to lie on a straight line.

In your case, the random variable $X_H = Y$ is normally distributed and the random variable $X_A = Y + Z$ is not. To be specific suppose we have $n = 100$ observations from $X_H \sim Norm(100, 15).$

Let's look at three relevant plots. The ECDF of a dataset puts probability $1/n$ at each of the $n$ datapoints of a sample. Starting from height 0 at the left, it moves to 1 at the right through $n$ increments of $1/n$.

The EDCF imitates the population CDF, shown as a blue curve on the left plot.

In the Q-Q plot at the right, the vertical scale is distorted to make the normal CDF a straight line and points of the ECDF of a normal sample almost a straight line. (Simulated samples and plots are from R statistical software.)

 x.h = rnorm(100, 100, 15)
 par(mfrow=c(1,2))  # 2 panels side by side
   plot.ecdf(x.h, pch=20)
      curve(pnorm(x, 100, 15), lwd=2, col="blue", add=T)
   qqnorm(x.h, datax=T)
 par(mfrow=c(1,1))

enter image description here

Now we show Q-Q plots of data from hypothetical (normal) and alternative (non-normal) distributions. I have used $X_A = X_H + Z$ where $Z$ is exponential with mean 50.

 x.a = x.h + rexp(100, 1/50)
 par(mfrow=c(1,2))
    qqnorm(x.h, datax=T)
    qqnorm(x.a, datax=T)
par(mfrow=c(1,1))

enter image description here

The random variable $X_A$ is far from normal because of the added exponential component. The nonnormality of $X_A$ results in the markedly nonlinear Q-Q plot on the right.

The Shapiro-Wilk test is one of several tests of normality. Roughly speaking, it measures the degree of nonlinearity in the Q-Q plot. So you don't have to judge 'linearlity' just by eye.

Here are Shapiro-Wilk tests for $X_H$, with P-value far above 5% (consistent with normality), and for $X_A$, with P-value far below 5% (not consistent with normality).

 shapiro.test(x.h)

 ##        Shapiro-Wilk normality test

 ## data:  x.h 
 ## W = 0.9939, p-value = 0.935

 shapiro.test(x.a)

 ##        Shapiro-Wilk normality test

 ## data:  x.a 
 ## W = 0.9127, p-value = 5.913e-06

To make a good demonstration, I have used samples of moderate size and an alternative $X_A$ that is far from normal. For smaller samples or for alternatives that are more nearly normal, you cannot expect such clear-cut results.

This demonstration should get you started down the right path. For more information you can look in a statistics text or online for 'normal probability plot', 'quantile plot', 'Q-Q plot', 'tests of normality', and so on.