To calculate the probability of observing a new p-value that is greater than the current p-value

174 Views Asked by At

To calculate the test statistic realization, I was able to create my ANOVA table in R for my data-set.

> summary(mod)
            Df Sum Sq Mean Sq F value   Pr(>F)    
diet         3    228    76.0   13.57 4.66e-05 ***
Residuals   20    112     5.6                     
---

Now, I need to answer if H0 is true, and we independently repeat the experiment, what is the probability of observing a new p-value that is greater than the current p-value?

I understand that I basically need to answer the probability of Pr(>F) > 4.66e-05. Isn't this just asking for the probability that the new test statistic to be lower than the current one ? How do I calculate that ? As that would depend on the pooled Standard deviation & the mean squares ?

1

There are 1 best solutions below

0
On

If the null hypothesis is true. Comment. For a continuous test statistic under $H_0,$ the p-value considered as a random variable has distribution $\mathsf{Unif}(0,1).$ If $Y \sim \mathsf{Unif}(0,1),$ then $P(Y > 4.66e-05) \approx 1.$

An intuitive graphical demonstration of this uniformity of P-values under $H_0$ by simulation is as follows. I have used 4 groups with 6 replications per group as in your ANOVA. $H_0$ true with all four group means $\mu_i = 50$ and group variances $\sigma^2 \approx 5.6,$ as suggested by your ANOVA table. Indices [1,5] access the P-values of $m= 10^5$ such datasets.

set.seed(4818) 
gp = rep(1:4, each=6)
m = 10^5;  pv=numeric(m)
for(i in 1:m) {
  x = rnorm(24, 50, 2.4)
  pv[i] = anova(lm(x~gp))[1,5] }
summary(pv)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.0000067 0.2491900 0.4970388 0.4980652 0.7480676 0.9999857 

hist(pv, prob=T, xlim=c(-.1,1.1), col="skyblue2")
curve(dunif(x),add=T,lwd=2,col="blue", n=10001)

enter image description here

Based on observed value of F-statistic for your particular ANOVA: Under the alternative hypothesis, the p-value is no longer uniform, but puts more probability on values near 0.

Outline. You observed $F = 13.57.$ However, $F$ has a noncentral F-distribution if $H_0$ is not true. You can use MS(Diet) to estimate the noncentrality parameter. The degrees of freedom ndf = 3 and ddf = 20 are as shown in the ANOVA table. Then in R use df with noncentrality as the fourth parameter to get the desired probability. This is related to the topic of the power of the F-test.

In a similar simulation with means $\mu_1=\mu_2 = 50; \mu_3 = \mu_4 = 60$ the simulated distribution of $10^6$ F-statistics (blue histogram) is compared with the distribution of $\mathsf{F}(3,20)$ (red density curve). At each iteration the data are sampled using x = c(rnorm(12, 50, 2.4), rnorm(12, 60, 2.4)).

summary(f)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  9.692  34.635  43.687  46.649  55.247 213.169 

enter image description here