In a course which touches information theory, I(X, Y) is mentioned and defined with a formula:
$I(X, Y) = log_2\frac{p(X/Y)}{p(X)}$
where $P(X/Y)$ is the conditional probability of event X happening if event Y has already happened.
I think I've seen the notation I(X, Y) used for mutual information in a couple of places, however mutual information is always non-negative, while the previous definition allows negative values, i.e. whenever $P(X/Y) < P(X)$. What, then, does I(X, Y) defined in this way mean?
Also, how would I connect that to the intuition behind amount of information I(X) ("Minimal number of Yes/No question needed to...")
It is called mutual information and its form is $I(X;Y)=\sum_{x,y}p(x,y)log_2(p(x,y)/(p(x)p(y)))$ (or integrals for continuums).
Notice it is homogeneous. (Can change $x$ and $y$)
It is a measure of how much information is shared by $X$ and $Y$ please see this famous Venn diagram