I am studying information theory and am interested in the concept of mutual information. Mutual information is defined as $I(X;Y)=H(X)-H(X|Y)$, where $H(\cdot)$ is entropy. And I interpret it as the uncertainty reduction of $X$ given $Y$. I am wondering how is mutual information different from regression. Theoretically, running regression gets us how $Y$ explains $X$, right? In that case, what's the major difference and what's the advantage of mutual information?
Thanks.
Regression is intimately related to "predictabily", in the sense of $\hat X = E(X\mid Y) $ being a prediction (a guess) of the random variable $X$ in terms of the variable $Y$. So, yes, "regression gets us how $Y$ explains $X$"... but only in that weak sense. Yes, it's true that if $X,Y$ are independent then $\hat X = E(X\mid Y) =E(X)$ (no regression, $X$ is "not predictable" from $Y$). But the reverse is not true. It can happen that $E(X\mid Y) =E(X)$ but even then $X,Y$ are not independent. So, the concept of lack of regression/predictability is weaker than the concept of independence. (And the concept of "uncorrelatedness" is still weaker).
Also, regression is not symmetric. It can happen that $E(X\mid Y) =E(X)$ but $E( Y \mid X) \ne E(Y)$ - so one would say that $X$ helps us to guess $Y$, but $Y$ does not help us to guess $X$.
By contrast, mutual information $I(X;Y)$ is symmetric, and it's zero if and only $X,Y$ are independent.
This is just one difference, there are others. In short, in spite of having similar intepretations (measure how two random variables are related), they are constructed with different goals and have different properties.