To calculate the test statistic realization, I was able to create my ANOVA table in R for my data-set.
> summary(mod)
Df Sum Sq Mean Sq F value Pr(>F)
diet 3 228 76.0 13.57 4.66e-05 ***
Residuals 20 112 5.6
---
Now, I need to answer if H0 is true, and we independently repeat the experiment, what is the probability of observing a new p-value that is greater than the current p-value?
I understand that I basically need to answer the probability of Pr(>F) > 4.66e-05. Isn't this just asking for the probability that the new test statistic to be lower than the current one ? How do I calculate that ? As that would depend on the pooled Standard deviation & the mean squares ?
If the null hypothesis is true. Comment. For a continuous test statistic under $H_0,$ the p-value considered as a random variable has distribution $\mathsf{Unif}(0,1).$ If $Y \sim \mathsf{Unif}(0,1),$ then $P(Y > 4.66e-05) \approx 1.$
An intuitive graphical demonstration of this uniformity of P-values under $H_0$ by simulation is as follows. I have used 4 groups with 6 replications per group as in your ANOVA. $H_0$ true with all four group means $\mu_i = 50$ and group variances $\sigma^2 \approx 5.6,$ as suggested by your ANOVA table. Indices
[1,5]access the P-values of $m= 10^5$ such datasets.Based on observed value of F-statistic for your particular ANOVA: Under the alternative hypothesis, the p-value is no longer uniform, but puts more probability on values near 0.
Outline. You observed $F = 13.57.$ However, $F$ has a noncentral F-distribution if $H_0$ is not true. You can use MS(Diet) to estimate the noncentrality parameter. The degrees of freedom ndf = 3 and ddf = 20 are as shown in the ANOVA table. Then in R use
dfwith noncentrality as the fourth parameter to get the desired probability. This is related to the topic of the power of the F-test.In a similar simulation with means $\mu_1=\mu_2 = 50; \mu_3 = \mu_4 = 60$ the simulated distribution of $10^6$ F-statistics (blue histogram) is compared with the distribution of $\mathsf{F}(3,20)$ (red density curve). At each iteration the data are sampled using
x = c(rnorm(12, 50, 2.4), rnorm(12, 60, 2.4)).