How to compute a p-value for a difference of variance?

159 Views Asked by At

Given a large population (size $n$) separated into some groups (for simplicity of equal size $k$), each population member is assigned a $1$ (true) or a $0$ (false). The population mean is $p$.

Null hypothesis: the distribution is binomial, hence the group means follow a distribution with mean $p$ and variance $kp(1−p)$

Alternative hypothesis: the difference between the group means is not consistent with a binomial distribution but due to some other reason

How do I compute a p-value for the null hypothesis?

In my actual data the variance between the group means is much bigger than what one would expect under a binomial distribution. I want to use this p-value as a justification that the difference between groups is not just due to chance.

I already posted this question in Cross-Validated under https://stats.stackexchange.com/questions/327111/how-to-compute-a-p-value-for-a-difference-of-variance but did not get any answers or comments.

Edit in response to comment: The actual data is health care data. So $n$ is around 10 million people. I have 400 groups that correspond to geographic regions with 10000 to 500000 people. $p$ is just computed from the overall population and lies between 0.1% and 10% depending on the application. For p around 10% the binomial predicts very little varation (standard deviation less than 1% iirc) but I get a range of 7 to 15% with around 2% sd.

1

There are 1 best solutions below

0
On BEST ANSWER

I used a Cramer-von Mises-Test as explained here: https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93von_Mises_criterion

It looks at the difference between the actual cumulative distribution and the expected one. The actual one I have from the observed data and the expected one is the binomial distribution as described in the question. The square difference is integrated over the entire domain. This value is then compared to some tabulated values. Wikipedia is unfortunately not very specific about what to compare to but the stats Software R is. The Cramer-von Mises-Test is integrated in the library goftest and can be computed with cvm.test . This gives a p-value exactly as I wanted.