Is information entropy $H(X)$ a sub modular function?

1.3k Views Asked by At

I was trying to learn more about sub modular functions and wanted to see an example of proving that some function is sub modular. Wikipedia said that Entropy was an example so I decided to try it out to see if it really was sub modular (since they didn't provide a proof).

I will provide the proof I have however, I am not entirely satisfied with my proof and was wondering if it was correct or if someone had a better proof for it.

Lets say we have $n$ discrete random variables $X_1, ... ,X_n$ and let $\mathcal{X}$ be the alphabet of values that $X_i$ can take. Let me denote $X_A$ as the set of random variables such that $A \subseteq \{1,...,n \}$. Recall information entropy of a set of random variables to be:

$$ H(X_A) = \sum_{a \in A} \sum_{x_a \in \mathcal{X}} p_x(x_a) \log\big( \frac{1}{p_x(x_a)} \big)$$

Now lets define our function $F(A) = H(X_A)$. If F is sub modular then it should satisfy:

$$ \forall A \subseteq B , s \notin B, F(A \cup \{s\}) - F(A) \geq F(B \cup \{ s \}) -F(B)$$

Lets check that property. Hence, consider $A \subseteq B$ and $s \notin B$. Thus we have:

$$F(A) = H(X_A)$$

$$F(B) = H(X_B)$$

$$ F(A \cup \{s\}) = H(X_{A \cup \{s\} }) = H(X_A) + H(X_s) $$

$$ F(B \cup \{s\}) = H(X_{B \cup \{s\} }) = H(X_B) + H(X_s) $$

Now lets compute the marginal returns:

$$ F(A \cup \{s\}) - F(A) = H(X_A) + H(X_s) - H(X_A) = H(X_s) $$

Similarly:

$$ F(B \cup \{s\}) - F(B) = H(X_B) + H(X_s) - H(X_B) = H(X_s) $$

since the RHS is equal to the LHS, always, then, the property of sub modularity is satisfied.

Is that correct? It seemed to easy and being the case that they are always equal seemed pretty dissatisfying. It might be correct, but I am not sure.