Distribution of parameter under the null hypothesis for mixture distributions

113 Views Asked by At

I am conducting a classical hypothesis test concerning the value of some parameter, i.e. $H_{0}:\theta=\theta_{0}$. I'll denote the distribution of $\theta$ under the null as $f(\theta)$.

Suppose there is a 50% chance that $f(\theta)=g(\theta)$, a 30% chance $f(\theta)=h(\theta)$ and a 20% chance that $f(\theta)=m(\theta)$; where $g$, $h$ and $m$ are all known.

Intuitively, it doesn't seem to follow that one can use $0.5g(\theta)+0.3h(\theta)+0.2m(\theta)$ to form the distribution of $\theta$ under $H_{0}$ (the resulting distribution doesn't have a practical interpretation and of course won't consistently estimate the true distribution). Does anyone have any suggestions of what I could do? Is there even a solution to this problem?

PS Not sure if relevant; but g, h and m were formed using a kernel nonparametric density estimator.

Edit:

  • The alternative hypothesis is $H_{A}:\theta≠\theta_{0}$.
  • $\theta$ is neither a scale nor location parameter. It is actually one of the coefficients (the intercept) in a LS-regression model, but I am applying an algorithm to get around a tricky (and rather unique) multiple comparisons problem. The final step in this is forming the null distribution to conduct a hypothesis test, and whilst I have calculated the vector of probabilities (0.5, 0.3, 0.2) and the corresponding null-distributions (g, h, m), I have became stuck with this final step!
1

There are 1 best solutions below

0
On

Comment: I want to use simulation to demonstrate the PDF of your mixture of distributions. Let the first be $Exp(rate = 1)$ (50%), the second be $Norm(\mu=2,\sigma=1/2)$ (30%), and the third be $Unif(2,4)$ (20%).

Simulation. The program below makes these choices for 100,000 observations, plots the histogram, and then plots the weighted average of the PDFs that you do not find intuitive--and I do.

 B = 10^5;  x = k = numeric(m)
 for (i in 1:m) {
   h = sample(1:3, 1, prob=c(.5,.3,.2));  k[i] = h
   x[i] = (h==1)*rexp(1,1) + (h==2)*rnorm(1,2,.5) + (h==3)*runif(1,2,4) }

 table(k)/B
 ## k
 ##       1       2       3 
 ## 5.00413 2.99687 1.99900 

 mean(x);  sd(x)
 ## 1.699684   # apprx E(X)
 ## 1.118747   # apprx SD(X)

Comments on simulation results. It seems from table(k)/B that the distributions were chosen in the intended proportions. Also, $E(X) = .5(1) + .3(2) + .2(3) = 1.7$ is well approximated. So the simulation program is working as intended.

It is a little more difficult to find the variance $V(X)$ of the mixture distribution because it reflects both the variances of the three individual distributions and the scatter of their means. (See the Wikipedia article on 'mixture distributions' under moments.)

 hist(x, prob=T, col="skyblue2", ylim=c(0,.5))
 curve(.5*dexp(x,1)+.3*dnorm(x,2,.5)+.2*dunif(x,2,4), 
    lwd=2, col="blue", n=1001, add=T)

enter image description here

It seems that the weighted sum of PDFs is the correct PDF. I deliberately chose three different distributional shapes in order to get a combined distribution with an unusually shaped PDF and a relatively obvious visual test of fit. (The bit of a tail to the right of 4 is mainly due to the exponential component, partly to normal.)

Next? We will have to think about how to do the testing. Maybe something like a one-sample t statistic will work, and we can discover the critical value of its distribution through simulation. Because you have seen some data, you may have better suggestions what to try next.