Estimate $P(A \cap B \cap C)$ from $P(A \cap C), P(B \cap C), P(A \cap B)$

147 Views Asked by At

I have a (finite) set of events $A,B,C,$... .

I know the unconditional probabilities of each event, $P(A), P(B), P(C),$... .

I also know each pairwise probability $P(A \cap B), P(A \cap C), P(B \cap C),$ ... .

I know that the probability of all events occuring together $P(A \cap B \cap C ...)$ is not fully determined by the probabilities I know, but it has to be consistent with a bunch of equations. Those restrict the possible values of $P(A \cap B \cap C ...)$.

For example, in the case of only 3 events $A,B,C$, I know that the estimate has to be consistent with the equations

1) $P(A \cap B \cap C) = P(C|A \cap B)\cdot P(A \cap B)$

2) $P(A \cap B \cap C) = P(B|A \cap C)\cdot P(A \cap C)$

3) $P(A \cap B \cap C) = P(A|C \cap B)\cdot P(C \cap B)$

Probably a bit naively I first tried to estimate $P(C|A \cap B)$ as the mean of $P(C|A)$ and $P(C|B)$. However, I realised this is not necessarily consistent with the equations.

In the case of more then 3 events, the equations become more complicated.

What is a reasonable estimate of $P(A \cap B \cap C...)$?

If there are only two events $A,B$, and the events are independent, $P(A \cap B) = P(A) \cdot P(B)$. Is there any kind of "higher order independence" that I can assume, so that I can compute $P(A \cap B \cap C...)$ from my limited information?

3

There are 3 best solutions below

0
On BEST ANSWER

I think I found a reasonable way to estimate what I want.

I estimate a latent multivariate normal distribution, and assume that each binary variable stems from an underlying normal distribution, but every value below a certain treshold is coded as 0, and above as 1.

For example, for variable $A$ with $P(A) = 0.7$ I assume that the underlying latent variable is a normal distribution, but every value below $z = 0.52$ is assigned $\overline{A}$ or $0$, and every value above is assigned $A$ or $1$. (Because 30% of the probability mass lies below this treshold)

enter image description here

Pictures from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3162326/ Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological methods, 12(1), 58.

From the bivariate distributions of every pair of variables, I estimate their polychric correlation, which is the correlation of the underying latent variables. https://en.wikipedia.org/wiki/Polychoric_correlation

enter image description here

I obtain a variance-covariance matrix and a vector of means of a multivariate normal distribution, from which I can sample.

After Sampling, I convert the continous variables back to categorial variables.

Then I can estimate probabilities like $P(A \cap B \cap C \cap \overline{D})$ from the simulated data.

A short example for 3 variables in R:

# polychoric correlation solution -----------------------------------------

library(tidyverse)
library(polycor)

p_A <- 0.2
p_B <- 0.2
p_C <- 0.3

mu <- qnorm(c(p_A, p_B, p_C))

p_AcondB <- 0.3 # positive correlation
p_AcondC <- 0.1 # negative correlation
p_BcondC <- 0.2 # no correlation

p_AandB <- p_B*p_AcondB
p_AandC <- p_C*p_AcondC
p_BandC <- p_C*p_BcondC

## obtain latent correlations

corl_AB <- polychor(matrix(c(
  1 - p_A - p_B + p_AandB, p_B - p_AandB,
  p_A - p_AandB, p_AandB
), nrow = 2))

corl_AC <- polychor(matrix(c(
  1 - p_A - p_C + p_AandC, p_C - p_AandC,
  p_A - p_AandC, p_AandC
), nrow = 2))

corl_BC <- polychor(matrix(c(
  1 - p_B - p_C + p_BandC, p_C - p_BandC,
  p_B - p_BandC, p_BandC
), nrow = 2))


S_lat <- matrix(c(1, corl_AB, corl_AC,
                  corl_AB, 1, corl_BC,
                  corl_AC, corl_BC, 1), nrow = 3)

n_sims <- 1e6

Y_corr <- MASS::mvrnorm(n = n_sims, mu = mu, Sigma = S_lat)

daty <- Y_corr %>% as.data.frame()

names(daty) <- c("A", "B", "C")

daty <- daty %>% mutate_all(~ifelse(. > 0, 1, 0))

get_p <- function(df){
  probs <- 
    data.frame(name = c("p(A)", 
                        "p(B)",
                        "p(C)",
                        "p(A,B)",
                        "p(A,C)",
                        "p(B,C)",
                        "p(A,B,C)"),
               value = c(
                 sum(df$A)/nrow(df),
             sum(df$B)/nrow(df),
                 sum(df$C)/nrow(df),
             sum(df$A&df$B)/nrow(df),
             sum(df$A&df$C)/nrow(df),
             sum(df$C&df$B)/nrow(df),
             sum(df$A&df$B&df$C)/nrow(df)
               )
    )
  return(probs)
}

get_p(daty)

@joriki I think that maximizing entropy would still be the best approach. But since I think the difference in computation time is huge, I will use this simpler method. And since the gaussian distribution is the maximum entropy distribution for given mean and variance, this solution should not be off too far. Big thanks anyways!

5
On

There is a "higher-order independence"; in fact there are two different concepts of higher-order independence: pairwise independence and mutual independence (see Wikipedia). But your variables are (typically) not even pairwise independent, since you're arbitrarily specifying the probabilities of events and pairs of events.

Rather, I think a reasonable generalization of the case of two events here would be to maximize the information entropy. Given probabilities for two events, independence maximizes the information entropy of their joint distribution. In a sense, by maximizing the information entropy, you're making as few assumptions as possible beyond the given data.

Unfortunately, this doesn't seem to lead to tractable equations. If I didn't make a mistake, in the case of three events the stationarity condition for $p_{ABC}=P(A\cap B\cap C)$ is

$$ \prod_iq_i=p_{ABC}(1-p_A-p_B-p_C-3p_{AB}-3p_{BC}-3p_{AC}+11p_{ABC})^{11} $$

with $q_A=(p_{BC}-p_{ABC})(p_A+p_{AB}+p_{AC}-3p_{ABC})^3$ and analogously for $q_B$ and $q_C$. This is a $12$th-order algebraic equation for $p_{ABC}$ in terms of the given probabilities.

To derive this equation, express the six given probabilites and the unknown probability $p_{ABC}$ in terms of the probabilities of the elementary events $A_i\cap B_j\cap C_k$ where $A_i\in\{A,\overline A\}$ etc., e.g. $p_A=p_{ABC}+p_{AB\overline C}+p_{A\overline BC}+p_{A\overline B\overline C}$ and $p_{AB}=p_{ABC}+p_{AB\overline C}$. That gives you $7$ linear equations, and the eighth is the normalization condition for the sum over the elementary probabilities. It's straightforward to solve this system of linear equations for the elementary probabilities; the solution is $p_{AB\overline C}=p_{AB}-p_{ABC}$ (and analogously for $p_{A\overline BC}$ and $p_{\overline ABC}$), $p_{A\overline B\overline C}=p_A+p_{AB}+p_{AC}-p_{ABC}$ (and analogously for $p_{\overline AB\overline C}$ and $p_{\overline A\overline BC}$) and $p_{\overline A\overline B\overline C}=1-p_A-p_B-p_C-3p_{AB}-3p_{BC}-3p_{AC}+11p_{ABC}$. Now you can add up the entropy contributions from the elementary probabilities, set the derivative with respect to $p_{ABC}$ to zero, and exponentiate to obtain the algebraic equation.

0
On

Hint:

Say we have three events $A_1$, $A_2$, $A_3$. These in term determine $8=2^3$ disjoint events $B_k$, indexed by $k\in \{ 0,1 \}^3$. For instance, we have $B_{(1,0,1)}= A_1\cap A_2^{c}\cap A_3$. Note that the $B_k$ are disjoint and $\cup_k B_k$ is the total space. Moreover, the various intersections of $A_i$'s can be expressed in terms of the $B_k$. For instance

$$A_2=B_{(0,1,0)}\cup B_{(0,1,1)}\cup B_{(1,1,0)}\cup B_{(1,1,1)}$$ $$A_1\cap A_3 =B_{(1,0,1)}\cup B_{(1,1,1)}$$ $$A_1\cap A_2 \cap A_3 = B_{(1,1,1)}$$

Denote $p(B_k)=p_k$. Then we have $p_k\ge 0$ and $\sum_k p_k=1$. Now we add the extra linear conditions from knowing the $p(A_i)$'s and $P(A_i\cap A_j)$'s. The $(p_k)$'s satisfying all these linear conditions form a polytope. Now finding the range of $p_{(1,1,1)}$ becomes a problem in linear programming. The possible range will be a segment, so we only need the maximal value and the minimal value, which one can get solving two optimizations problems ( a min and a max).