Understanding the confidence interval and statistical significance

792 Views Asked by At

I am struggling to understand confidence intervals and their relationships to a null hypothesis.

The basic definition of the confidence interval is: (1−α), where α is the statistical significance.

So let's say I have two cases:

  1. I've allocated a 70 percent confidence level of meeting the probability of my null hypothesis, this means I have a statistical significance of of .30.

  2. I've allocated a 95 percent confidence level of meeting the probability of my null hypothesis. α = .05.

Is it always better to have the case of a 95 percent confidence level, as I have the higher probability of not rejecting my null hypothesis? With the 70 percent confidence region, I always have a higher probably of falling outside the (1-α) region which rejects my null hypothesis.

To me having a 95 percent confidence is always better. Is there a reason to ever prefer 70 percent confidence interval? Would the 95 percent confidence scenario require less resources to sample from?

1

There are 1 best solutions below

0
On

You are correct that a 70% CI may be related to a significance level of 30%; and that a 95% CI may be related to a significance level of 5%.

Consider the following fictitious data: A sample of size $n=50$ from a normal population with $\mu = 100$ and $\sigma = 15.$

set.seed(905)
x = rnorm(50, 100, 15)

Some summary statistics are as follows:

summary(x);  length(x);  sd(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  62.76   85.32   98.02   96.99  108.22  143.80 
[1] 50           # sample size
[1] 16.28875     # sample standard deviation

stripchart(x, pch="|")

enter image description here

Suppose you want to test $H_0: \mu = 101.6$ against $H_a: \mu\ne 101.6.$ Results from a one-sample t test, t.test in R, are as shown below.

The P-value of the test is $0.051 > 0.05 = 5\%$ so you (just barely) miss rejecting at the 5% level. Also, a 95% CI for $\mu$ is $(92.36, 101.62),$ which does (just barely) contain the hypothetical value $101.6,$ You can say that the 95% CI is an interval of values of $\mu$ wich cannot be rejected at the 5% level. [95% CI's are the 'default' in R; if you want some other level of confidence, you have to say so.]

t.test(x, mu = 101.6)

        One Sample t-test

data:  x
t = -2, df = 49, p-value = 0.05106
alternative hypothesis: true mean is not equal to 101.6
95 percent confidence interval:
  92.36363 101.62205
sample estimates:
mean of x 
 96.99284 

By contrast, if you specify a 70% CI, here are results from 't.test'. The P-value is again $0.0501,$ But if you want to test at the 30% level, you can reject because $0.0501 < 0.30 = 30\%.$

Notice that the 70% CI is much shorter than the former 95% Ci. In particular the 70% CI $(94.58, 99.41)$ does not (even nearly) contain the hypothetical value $101,6.$

t.test(x, mu = 101.6, conf.lev=.7)

        One Sample t-test

data:  x
t = -2, df = 49, p-value = 0.05106
alternative hypothesis: true mean is not equal to 101.6
70 percent confidence interval:
 94.57980 99.40588
sample estimates:
mean of x 
 96.99284 

Note: The significance level of a test specifies the probability of (falsely) rejecting a true parameter value. A 30% significance level indicates a willingness to make that mistake almost a third of the time. For my fictitious data we happen to know the true value $\mu = 100.$ Because the 70% CI does not cover $100,$ we know that the t test would wrongly reject $H_0: \mu=100$ for my fictitious data. [With real data is very rare ever to know the exact true parameter value so one won't know when a false rejection is made.]