Imagine a Binary Erasure Channel as depicted on Wikipedia.
One equation describing the mutual information is following:
$$\begin{array}{rcl} I(x;y) &=& H(x) - H(x|y) \\ &=& H(x) - \left(P(y=0) \cdot 0 + P(y=\varepsilon) \cdot H(x) +P(y=1) \cdot 0\right) \end{array} $$
Why is it $P(y=\varepsilon) \cdot H(x)$ and not $P(y=\varepsilon) \cdot H(x|y=\varepsilon)$?
When you know that $Y=\varepsilon$, then you cannot say anything about $X$. Because the erasure could have been the channel output of both $X=0$ and $X=1$. Therefore knowing $Y=\varepsilon$, does not change our uncertainty of $X$ and $H(X|Y=\varepsilon)=H(X)$.
To give a mathematical reasoning behind this, let's start with following: $$ \mathbb P(X=0\mid Y=\varepsilon)=\frac{\mathbb P(X=0)\mathbb P(Y=\varepsilon|X=0)}{\mathbb P(Y=\varepsilon)} $$ Now suppose that $\mathbb P(X=0)=p$, we have: $$ P(Y=\varepsilon)=\mathbb P(Y=\varepsilon|X=0)\mathbb P(X=0)+\mathbb P(Y=\varepsilon|X=1)\mathbb P(X=1)=p_e \,p+p_e\, (1-p)=p_e $$ Therefore: $$ \mathbb P(X=0\mid Y=\varepsilon)=\frac{p \cdot p_e}{p_e}=\mathbb P(X=0) $$ This shows that : $$ H(X|Y=\varepsilon)=H(X) $$