Combining P-Values from multiple trials of the same experiment

117 Views Asked by At

this is my first question here, a little background about me, im a biomedical engineer, im studying a PhD in Neuroscience, and a Micromaster in Statistics and Data Science.

Here in my lab, very few people are interested at maths , models, etc. (incredible i know...) So i have so little references to ask about the best math procedure to validate or model things. Once said my sad situation lol this is my question:

Im doing a Causality test based on Granger Causality between 3 zones in the brain.

The analysis give me p - values associated with statistical causal conections between signals.

as an example of a event of interest:

ZONE P-Value

1 -> 2 .056345 1 -> 3 .005321 2 -> 1 .003214 2 -> 3 .000123 3 -> 1 .245021 3 -> 2 .002455

so imagine i have 50 events of interest and i want to give as a result the "mean" causal connection as a function of p-values: ¿What is the better way to combine all those p-values?

Thanks a lot since now.

This place helps me to not feel alone :/

1

There are 1 best solutions below

0
On

There are several ways to combine p-values from independent hypothesis tests when all the hypothesis are same but tested on different data.

One of the first methods is Fisher's method. Let's assume we have to a set of hypothesis $\mathcal{H} = \left \{H_1, \cdots , H_k \right \}$ and the p-values from the tests are $p_1, \cdots , p_k$. According to Fisher's method, the combined p-value can be found from $-2 \sum_{i=1}^k ln(p_i)$ which follows a $\chi_{df=2k}^2$.

The background behind Fisher's method for p-value combination is the following:

The CDF of Exponential distribution with the rate of $\lambda$ is $F(x) = 1-e^{-\lambda x}$ and the inverse CDF is $x=-\frac{1}{\lambda} ln(1-F(x))$. If $p$ is a uniform random variable within the interval [0,1], then $1-p$ is also $\mathcal{U} (0,1)$. Then, we can write $x=-\frac{1}{\lambda} ln(p)$ and $p$~$\mathcal{U}(0,1)$. This highlights the fact that the negative of of the natural log of a random variable distributed as $\mathcal{U}(0,1)$ follows an Exponential distribution with rate $\lambda = 1$. We also know that $\chi_{df=2}^2$ is equivalent to $exp(\lambda = 1/2)$ and $\sum_{i=1}^k \chi_{i, df=2}^2 = \chi_{df=2k}^2$.

When the null hypothesis is true, then the p-value is distributed as $\mathcal{U}(0,1)$, but the combination of $k$ p-values from $k$ independent but same hypothesis tests is $\prod _{i=1}^k p_i$ do no follow $\mathcal{U}(0,1)$. Now, we can use the above facts to find the distribution of $\prod _{i=1}^k p_i$. $$ln(\prod p_i) = \sum ln(p_i)$$ $$-ln(p_i) \sim exp(\lambda = 1)$$ $$-2ln(p_i) \sim exp(\lambda = 1/2) \sim \chi_{df=2}^2$$ $$-2ln(\prod p_i)=-2\sum_{i=1}^k ln(p_i) \sim \chi_{df=2k}^2$$

There are a bunch of other methods which are more "powerful" than Fisher's method. Through a simulation study, MC Whitlock showed that the weighted Z test in the following form is more powerful than Fisher's method. $$Z = \frac{\sum_{i=1}^k w_iZ_i}{\sum_{i=1}^k w_i^2}$$ Here, weights can be inverse of standard error or sample size.

Some other possible methods are:

  1. Truncated p-value method by Zaykin et al.
  2. Generalized Fisher's method by Lancaster