Mutual information (see Wikipedia) is defined as
$I(X_1;X_2) = \mathbb{E}\left[\log \frac{p(X,Y)}{p(X)p(Y)}\right]$, which can be in the case of descrete random variables $X$ and $Y$ "simplified" to
$$\sum_{x,y}p(x,y)\log \frac{p(x,y)}{p(x)p(y)}\quad\quad (a)$$
Wikipedia also says that the multivariate mutual information (MMI) is defined as
$$I(X_1;\ldots;X_{n}) = I(X_1;\ldots;X_{n -1}) - I(X_1;\ldots;X_{n - 1}|X_{n})$$
where $I(X_1;\ldots;X_{n-1}|X_{n}) = \mathbb E_{X_{n}}\big(I(X_1;\ldots;X_{n - 1})|X_{n}\big)\text{.}$
For $n = 3$, the situation is clear to me, since my intuition of what $\mathbb E_{X_{n}}\big(I(X_1;\ldots;X_{n - 1})|X_{n}\big)$ should be (simply condition all probabilities in $(a)$), agrees with the formula:
$$I(X_1;X_2|X_3) = \mathbb E_{X_3} \big(I(X_1;X_2)|X_3\big) = \sum_{x_3} p(x_3) \sum_{x_2} \sum_{x_1} p(x_1,x_2|x_3) \log \frac{p(x_1,x_2|x_3)}{p(x_1|x_3)p(x_2|x_3)}\text{.}$$
Moreover, in the case of $n = 3$, the MMI can be expressed by the "inclusion-exclusion principle" sum $$I(X_1;X_2;X_3) = \sum_{I \subseteq \{1,2,\dots ,n\}} (-1)^{|I| + 1}H(X_I)\text{,}$$ where $H(X_I)$ is the entropy of the vector of the variables $X_i$, for which $i\in I$.
For $n= 4$, I would say that
$I(X_1;X_2;X_3|X_4) = \sum_{x_4}p(x_4)\sum_{x_3} p(x_3|x_4)\sum_{x_2}\sum_{x_1} p(x_1,x_2|x_3,x_4)\log\frac{p(x_1,x_2|x_3,x_4)}{p(x_1|x_3,x_4)p(x_2|x_3,x_4)}$, but in this case, the upper "inclusion-exclusion principle sum" does not hold, since some terms are missing. This answer to a similar question offers a nice suggestion for the definition of MMI, but there are two problems:
- The answer lacks a motivation for the definition.
- Only for odd $n$s, the definition is in concordance with the fact that MMI should equal $$\sum_{I \subseteq \{1,2,\dots ,n\}} (-1)^{|I| + n}H(X_I){.}$$
Can you give me an explicit formula for conditional MMI for $n\in \mathbb{N}$ or at least for $I(X_1;X_2;X_3|X_4)$ (and motivation for it)?
From here, I will be probably able to find the defintion of MMI itself.
I'll use the shorthand $p_i$ for $p(x_i)$ as you've used in the question. Sums below are over the set $\mathcal{X}_1\times \mathcal{X}_2 \times \mathcal{X}_3 \times \mathcal{X}_4$, and it's tacitly assumed that all of these are finite sets.
\begin{align*}I(X_1;X_2;X_3|X_4) &= I(X_1;X_2|X_4) - I(X_1;X_2|X_3X_4) \\ &= \sum p_{1234} \log \frac{p_{12|4}}{p_{1|4} p_{2|4}} - \sum p_{1234} \log \frac{p_{12|34}}{p_{1|34} p_{2|34}} \\&= \sum p_{1234} \log \frac{p_{12|4} p_{1|34} p_{2|34}}{p_{1|4}p_{2|4}p_{12|34}},\end{align*}
where the first equality is by definition, and the rest are manipulation. But $p_{1|34}p_{3|4} = p_{13|4}$ by Bayes' rule. We thus have $$I(X_1;X_2;X_3|X_4) = \sum p_{1234} \log \frac{p_{12|4}p_{13|4}p_{23|4}}{p_{1|4}p_{2|4}p_{3|4}^2 p_{12|34}} = \sum p_{1234} \log \frac{p_{12|4}p_{23|4}p_{13|4}}{p_{1|4}p_{2|4}p_{3|4} p_{123|4}}$$
which is a nice expression since it shows the symmetry of the functional explicitly. Now note that the expression you've written is actually $I(X_1;X_2|X_3X_4)$, and not $I(X_1;X_2;X_3|X_4)$.
Working further from this, you should be able to show that $$I(X_1;X_2;X_3;X_4) = \sum p_{1234} \log \frac{p_{12}p_{13}p_{14}p_{23}p_{24}p_{34} \cdot p_{1234}}{p_1 p_2 p_3 p_4 \cdot p_{123} p_{124} p_{134} p_{234}}$$
(the dots are the usual products and are only put in for clarity). From here the inclusion-exclusion expression in terms of entropies of subsets is obvious. Indeed, it's possible to show the same inductively for every $n$-fold expression.(try it!).
One operational meaning for the expression $I(X_1;X_2:\dots; X_k)$ is in the achievability region for the $k$-user broadcast channel determined by the coding scheme due to Marton. Check out ch. 8(?) in the book by El-Gamal & Kim. (NB - there may well be other motivations for the same, this is just the one I happen to know)