Submodularity is a property of set functions $f: 2^{V} \rightarrow \mathbb{R}$ that assign each subset $S\subseteq V$ a value $f(S)$. $F$ is called submodular if for all $A, B \subseteq V$ it holds that \begin{equation} F(A) + F(B) \geq F(A \cup B) - F(A \cap B) \end{equation}
Alternatively, a set function is sub-modular iff for all $A \subseteq B \subseteq V$ and $s \in V \setminus B$, it holds that \begin{equation} F(A \cup \{s\}) - F(A) \geq F(B \cup \{s\}) - F(B) \end{equation}
My question is how to prove that Mutual Information $I(X;Y)$ is submodular?
MI can be expressed as a set function $f(A) = I(X;Y_A)$. We make a distinction between latent variables $X$ and observed variables $Y$. For any pair $I, J \subseteq V$ such that $I \cap J = \emptyset$, we assume $Y_I$ is independent of $Y_J$ given $X$. Let $A\subseteq B\subseteq V$, and $j \in V\setminus B$, we have: \begin{eqnarray} I(Y_j; Y_{B\setminus A} | Y_A) &\geq& 0 \\ H(Y_j|Y_A) - H(Y_j|Y_{(B\setminus A)\cup A}) &\geq& 0 \\ H(Y_j|Y_A) &\geq& H(Y_j|Y_B) \end{eqnarray} which is the submodularity for entropy. Continuing by using conditional independence assumption and subtracting $H(Y_j|X)=H(Y_j|X,Y_A)=H(Y_j|X,Y_B)$ from both sides: \begin{eqnarray} H(Y_j|Y_A) - H(Y_j|X) &\geq& H(Y_j|Y_B) - H(Y_j|X)\\ H(Y_j|Y_A) - H(Y_j|X,Y_A) &\geq& H(Y_j|Y_B) - H(Y_j|X, Y_B)\\ I(Y_j;X|Y_A) &\geq& I(Y_j;X|Y_B) \\ I(X;Y_j|Y_A) &\geq& I(X;Y_j|Y_B) \end{eqnarray}