Let $X_k$ be the number of heads observed when tossing coin $k$ $n_k$ times. Each coin has it's own independent heads probability $p_k$:
$$ X_k \sim \text{Binomial}(n_k, p_k) $$
Each $p_k$ is drawn from some distribution $P$. I am interested in conducting a hypothesis test to test whether $P$ has mean $\frac{1}{2}$. Is there an appropriate testing framework to use?
Does the choice of $P$ matter (a lot?). $P$ will be a reasonably nice distribution (eg continuous, unimodal).
I know this question may not be fully precisely stated - the question is motivated by the design of a biological experiment. Any ideas or references would be appreciated.
Let's take $P$ to be distributed as $G/(G+H)$ where $G$ and $H$ are iid Poisson with parameter $\lambda_k$. I'd be interested in the answer with $\lambda \in [10,100]$. This may be intractable. I'd also be interested in the answer with a more simple $P$, say the hat distribution on $[a,1−a]$. It would be great if there was an (approximate) answer that depends only on the moments of $P$.
The spirit of the question is really to understand the relationship between $n$ and $k$. In order to have a powerful test, should I flip more coins, or should I flip each coin more often?
You could structure this as a Bayesian hierarchical model and analyze the posterior distribution of $ P $ conditioned on your "populations" which are the coin toss outcomes under each $ p_k $. This is, of course, just one example.
The precise distribution of $ P $ will definitely depend on your choice of prior on $ P $, but at the end of the day you can always numerically approximate the posterior distribution of $ P $.