Exact Likelihood Ratio Distribution of Null P-value Distribution over Alternative P-value Distribution

33 Views Asked by At

Let \begin{equation*} f(Z_{p}) =\exp(\frac{1}{2}[Z_{p}^2-(Z_{p}-\sqrt{n}\theta)^2]) \end{equation*} Where $f(Z_{p})$ is the distribution of the one sided p-value for a Z test under the alternative (the distribution comes from here on the third page written as a ratio of two normals), $Z_{p} = \Phi^{-1}(1-p)$, $\Phi$ is the standard normal CDF, $0 < p <1 $, $Z_{p} \sim N(\sqrt{n}\theta,1)$, $n$ is the sample size, and $\theta$ is equal to the effect size.

I am having trouble finding the distribution of the likelihood ratio. To start, if $\theta=0$, then the distribution of the p-value is Uniform$(0,1)$ and the likelihood is always 1 (just plug in $\theta=0$ above). Thus the ratio statistic is simply $1/\mathcal{L}(\theta;Z_{p})$ or

\begin{equation*} \frac{1}{\mathcal{L}(\theta;Z_{p})} =\exp(-\frac{1}{2}\sum_{i=1}^{m}[Z_{p_{i}}^2-(Z_{p_{i}}-\sqrt{n_{i}}\theta)^2]) \end{equation*}

Where $m$ is equal to the number of independent experiments performed.

When I first looked at this it seemed straightforward to find the distribution of this statistic (since $Z_{p}$ is normal). However, I feel that I've done something wrong or am interpreting my results incorrectly. Here is my work so far.

$$\begin{eqnarray} \frac{1}{\mathcal{L}(\theta;Z_{p})} &=&\exp(-\frac{1}{2}\sum_{i=1}^{m}[Z_{p_{i}}^2-(Z_{p_{i}}-\sqrt{n_{i}}\theta)^2]) \\ &=& \exp(-\frac{1}{2}\sum_{i=1}^{m}[Z_{p_{i}}^2-(Z_{p_{i}}^2-2\sqrt{n_{i}}\theta Z_{p_{i}}+n_{i}\theta^2)]) \\ &=& \exp(-\frac{1}{2}\sum_{i=1}^{m}[Z_{p_{i}}^2-Z_{p_{i}}^2+2\sqrt{n_{i}}\theta Z_{p_{i}}-n_{i}\theta^2)]) \\ &=& \exp(-\frac{1}{2}\sum_{i=1}^{m}[2\sqrt{n_{i}}\theta Z_{p_{i}}-n_{i}\theta^2)]) \\ &=& \exp(\sum_{i=1}^{m}[-\sqrt{n_{i}}\theta Z_{p_{i}}+\frac{1}{2}n_{i}\theta^2)]) \end{eqnarray}$$

To find the distribution of the sum, I decided to start with a single component and then use the additive property of normals. Since $Z_{p} \sim N(\sqrt{n}\theta,1)$

$$\begin{eqnarray} E(-\sqrt{n}\theta Z_{p}+\frac{1}{2}n\theta^2) &=& -\sqrt{n}\theta E(Z_{p})+\frac{1}{2}n\theta^2\\ &=& -\sqrt{n}\theta \sqrt{n}\theta+\frac{1}{2}n\theta^2\\ &=& -n\theta^2+\frac{1}{2}n\theta^2\\ &=& -\frac{1}{2}n\theta^2 \end{eqnarray}$$

and

$$\begin{eqnarray} Var(-\sqrt{n}\theta Z_{p}+\frac{1}{2}n\theta^2) &=& n\theta^2 Var(Z_{p})\\ &=& n\theta^2*1\\ &=& n\theta^2 \end{eqnarray}$$

Adding $Z_{p_{i}} m$ times (with a different sample size $n_{i}$ per experiment) we should get that $\sum_{i=1}^{m}[-\sqrt{n_{i}}\theta Z_{p_{i}}+\frac{1}{2}n_{i}\theta^2)] \sim N(-\frac{1}{2}\sum_{i=1}^{m}n_{i} \theta^2, \sum_{i=1}^{m}n_{i}\theta^2)$. A normal that is exponentiated is just a log normal, so we should have \begin{equation*} \frac{1}{\mathcal{L}(\theta;Z_{p})} \sim LogNormal(-\frac{1}{2}\sum_{i=1}^{m}n_{i} \theta^2, \sum_{i=1}^{m}n_{i}\theta^2) \end{equation*}

However, when $\theta=0$ we get a degenerate distribution (which I'm not sure should happen). Also, when trying to find the appropriate $k$ to use in a likelihood ratio test (with an $\alpha=.05$), I get cutoff values that are far too small. Using R's quantile function, for a .05 cutoff, even with $\theta=.5,n=30$, and $m=1$, I get a value of $1.032201e-07$! Can ya'll help me understand what is going wrong/ what I am interpreting incorrectly from the results?

The code I used was

qlnorm(.05,-.25*15,.25*30)