Consider the following formula for mutual information (MI) between continuous random variables X and Y:
$$I(X; Y) = \iint f(x,y) \log\left(\frac{f(x,y)}{g(x)h(y)}\right) \, dx \, dy$$
I 've read elsewhere that unlike Pearson Correlation, MI does not capture the direction of the dependence (i.e., positive or negative relationship) but only the magnitude. Focusing on a particular pair (x,y), my current thoughts are that:
$f(x,y) > g(x)h(y)$: This is the case if the two variables have high dependence (either positive or negative) at the particular point.
$f(x,y) = g(x)h(y)$: At this point the two variables are independent.
$f(x,y) < g(x)h(y)$: So if (>) captures dependence (co-occurance) and (=) independence, what is the interpretation of this case? I originally suspected that this captures negative dependence. However, this cannot be the case because then negative dependence would be penalized due to a input less than 1 in the logarithm.
Q1: Does $f(x,y) > g(x)h(y)$ imply either positive or negative dependence, or only positive?
Q2: What does $f(x,y) < g(x)h(y)$ imply about the variables? If it is some form of dependence, why is it penalized in the formula? Similarly, if this is true, why is MI considered non-directional if it penalizes negative dependence?