Intro
This question is somewhat of a prequel to this one - but can be seen also as a standalone!
I have two distinct populations/measurements of boolean values: $\{x_i\}_{i=1}^{m_1},x_i \in \{0,1\},|x|=m_1$ and $\{y_i\}_{i=1}^{m_2},y_i\in\{0,1\},|y|=m_2$. Each $x_i$ (and $y_i$) is independent of the other boolean values.
These numbers represent activity values ($1$ = active, $0$ = inactive) of two different group of models, namely the good ones ($x$'s) and the bad ones ($y$'s). I wanted to find a measurement of how much the (mean) activity of the good models is different than the bad models' activity. So, I started with the average values $a_1=mean(x)$ and $a_2=mean(y)$ and took the difference: $d=a_1-a_2$. Then I wanted to include a penalty related to the amount of models $m_1,m_2$ and this was solved in this question by introducing a proper scaling function for the difference $d$.
The problem
I was considering that if I had two populations of continuous measurements, then I would probably do a Wilcoxon Rank Sum test (use wilcox.test in R for example) and get also a p-value + location parameter shift estimation (a kind of $d$ as is in the respective implementation) for this (for the boolean case, ranking methods are useless right?). So, the p-value would tell how sure I am about this difference - which is kind of what I wanted to solve numerically (by let's say inserting the uncertainty in the final formula) in this other question.
All in all, I need a statistical test that would compare boolean values and get me as a result a p-value and the d difference (or similar) so to speak.
What I tried
So, since I work with two different and independent set of boolean values $\{x_i\}$ and $\{y_i\}$ the words binomial distribution and bionomial test came into my mind (from loooong ago :) But after reading about them for a while and playing with the respective R functions, it turned out not I wanted this to be.
The most significant problem for me was that I do not know the probability of success, i.e. $p=P(x_i=1)$ (or $y_i$ for that matter) - actually that's what I want to know for each model category! I tried to use the average $a_1=mean(x)=p$ and then to take a value of the cumulative distribution but it does not give results representative to what I want (the average activity). And the binomial test does not come close to what I want either.
For example: $x=\{0,0,0,0,1,0\}, m_1=n=6,p=1/6=0.1667$ and then I would calculate the $P(X\geq 1)=0.665$, which is way higher than expected.
Maybe I am overthinking it, but I believe there should be something close to what I want to do with this boolean dataset that I have. Any directions for distribution(s) or statistical tests that can accommodate my case are welcome!
Also posted on Cross Validated.
For continuous measurements of two random boolean variables, in a setting in which you'd like to combine the distributions and consider them as one, I'd recommend a Poisson distribution.
It's a close approximation to a binomial and the concept of the "average frequency of event $A$" and the "average frequency of event $B$" are clearly additive since the average frequency (per unit time) of events $A$ and $B$ is the sum of the two means.