first time poster here (happy to edit if I am violating any guidelines, please just let me know) :)
I am curious whether the following formula from this paper by Li and Liang for the probability of an interaction between two proteins $X$ and $Y$ based on mutually interacting proteins can be extended for a case of three or more proteins of interest. The PPI network described is a binary network representing some knowledge of protein-protein interactions.
The formula I would like to extend
Suppose that in a PPI network of size $N$, the degree (i.e., the number of interactions) for each protein node is fixed, but the interacting partners are randomly selected. This specifies the random network which we compare the real PPI data with. We randomly pick proteins $X$ and $Y$ ($X$ with $n_X$ interactions and $Y$ with $n_Y$ interactions) and find that $X$ and $Y$ share $m$ interacting partners (nodes) in this network. We denote the set of common partners as $A = \{G_1, G_2, ... ,G_m\}$, the set of all proteins as $\Omega$, and the number of interacting partners for each protein in $\Omega$ as $\kappa = \{n_1,n_2,...,n_N\}$.
The total number of graphs in which proteins $X$ and $Y$ have $m$ common partners is a product of three factors: (i) $m$ proteins can be chosen from any of the $N$ proteins, and there are $N \choose m$ ways to do that; (ii) the remaining $n_X - m$ proteins that interact only with protein $X$ can occupy $N - m$ spaces still available, resulting in a count of $N - m \choose n_X - m$; and (iii) $n_Y - m$ proteins that interact only with protein $Y$ can be in any available spaces, contributing a factor of $N - n_X \choose n_Y - m$. By multiplying these three factors and dividing by the total number of unrestricted ways for protein $X$ to have $n_X$ and protein $Y$ to have $n_Y$ interacting partners——we can arrive at the following formula (Algorithm I) by Samanta and Liang [28]:
$P_1(m|N,n_X,n_Y) = \frac{{N \choose m}{N - m \choose n_X - m}{N - n_X \choose n_Y - m}}{{N \choose n_X}{N \choose n_Y}}$
In the above the nodes of interest (proteins in this case) are X and Y, and I would like to extend this formula to calculate $P_1(m|N,n_X,n_Y,n_Z)$ given an additional nodes of interest.
My thinking so far:
Based on the logic presented above, my thinking is that the ${N \choose m}$ term would remain the same, as well as the ${N \choose n_X}$ and ${N \choose n_Y}$ terms in the denominator.
$m$ (previously the number of nodes with connections to both $X$ and $Y$) would now be the number of nodes with connections to $X$, $Y$, and $Z$.
There would be an additional term in the denominator ${N \choose n_Z}$.
So far that gives the below, with some terms still remaining to be figured out in the numerator:
$P_1(m|N,n_X,n_Y,n_Z) = \frac{{N \choose m} ... }{{N \choose n_X}{N \choose n_Y}{N \choose n_Z}}$
My intuition is that there will be a term in the numerator for each of the sets in the "Venn diagram" of sorts formed by the nodes connected to (proteins interacting with) $X$, $Y$, and $Z$. In this case there would be seven terms: three for the proteins only interacting with $X$, $Y$, and $Z$, respectively, three for those interacting with ($X, Y,$ but not $Z$), ($X, Z,$ but not $Y$), and ($Y, Z,$ but not $X$), and then a term for those interacting with all three.
My question
I would like to generalize the formula for $P_1$ to several nodes of interest (more than three), but am stuck on how to to do this. The pattern for terms in the denominator is clear, but I am stuck on how to generalize the numerator.
Any help or clarification is appreciated!
Thank you!