I have a (finite) set of events $A,B,C,$... .
I know the unconditional probabilities of each event, $P(A), P(B), P(C),$... .
I also know each pairwise probability $P(A \cap B), P(A \cap C), P(B \cap C),$ ... .
I know that the probability of all events occuring together $P(A \cap B \cap C ...)$ is not fully determined by the probabilities I know, but it has to be consistent with a bunch of equations. Those restrict the possible values of $P(A \cap B \cap C ...)$.
For example, in the case of only 3 events $A,B,C$, I know that the estimate has to be consistent with the equations
1) $P(A \cap B \cap C) = P(C|A \cap B)\cdot P(A \cap B)$
2) $P(A \cap B \cap C) = P(B|A \cap C)\cdot P(A \cap C)$
3) $P(A \cap B \cap C) = P(A|C \cap B)\cdot P(C \cap B)$
Probably a bit naively I first tried to estimate $P(C|A \cap B)$ as the mean of $P(C|A)$ and $P(C|B)$. However, I realised this is not necessarily consistent with the equations.
In the case of more then 3 events, the equations become more complicated.
What is a reasonable estimate of $P(A \cap B \cap C...)$?
If there are only two events $A,B$, and the events are independent, $P(A \cap B) = P(A) \cdot P(B)$. Is there any kind of "higher order independence" that I can assume, so that I can compute $P(A \cap B \cap C...)$ from my limited information?
I think I found a reasonable way to estimate what I want.
I estimate a latent multivariate normal distribution, and assume that each binary variable stems from an underlying normal distribution, but every value below a certain treshold is coded as 0, and above as 1.
For example, for variable $A$ with $P(A) = 0.7$ I assume that the underlying latent variable is a normal distribution, but every value below $z = 0.52$ is assigned $\overline{A}$ or $0$, and every value above is assigned $A$ or $1$. (Because 30% of the probability mass lies below this treshold)
Pictures from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3162326/ Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological methods, 12(1), 58.
From the bivariate distributions of every pair of variables, I estimate their polychric correlation, which is the correlation of the underying latent variables. https://en.wikipedia.org/wiki/Polychoric_correlation
I obtain a variance-covariance matrix and a vector of means of a multivariate normal distribution, from which I can sample.
After Sampling, I convert the continous variables back to categorial variables.
Then I can estimate probabilities like $P(A \cap B \cap C \cap \overline{D})$ from the simulated data.
A short example for 3 variables in R:
@joriki I think that maximizing entropy would still be the best approach. But since I think the difference in computation time is huge, I will use this simpler method. And since the gaussian distribution is the maximum entropy distribution for given mean and variance, this solution should not be off too far. Big thanks anyways!