One way to define the mutual information is
$I(X;Y) = H(X) - H(X|Y)$
I have found it useful to look the related quantity
$?(X;Y=y) = H(X) - H(X|Y=y)$
That is, we look at how much the entropy of $X$ is decreased given a particular outcome $y$ for $Y$.
It is not hard to see that the mutual information is regained on expectation on $Y$, so that we have
$E_Y[?(X;Y=y)] \\ = \sum_y p(y)(H(X) - H(X|Y=y)) \\ = H(X) - \sum_y p(y)H(X|Y=y) \\ = H(X) - H(X|Y) \\ = I(X;Y)$
My question: does my $?(X;Y=y)$ function have a name? Or a standardized notation?
Note it's not the same as pointwise mutual information: rather, I think this would be the expectation of pointwise mutual information, but only on $X$ (rather than both variables). So it's "in between" the regular mutual information and the pointwise version.
This measure was probably first proposed and analyzed in
More recently, it has attracted attention in the neuroscience community. In particular, see the definition of $I_2$ in Eq. 6 in
and various articles that cite DeWeese and Meister. In most of these citing articles, $I_2$ has been called "specific information"; however, some care is required, because specific information can also (more rarely) refer to $I_1 := D(p(X|y)|| p(X))$ (note that mutual information can be written as an expectation of either $I_1$ or $I_2$; $I_1$ is sometimes called "specific surprise").
$I_2$ also goes by various other names, including "predictability" (R Bramon et al. "Multimodal data fusion based on mutual information." IEEE Transactions on Visualization and Computer Graphics, (2012): 1574-1587.), and the "i-measure" (P Smyth and RM Goodman. "An information theoretic approach to rule induction from databases." IEEE transactions on Knowledge and data engineering, (1992): 301-316, Eq. 2).