I would like to calculate $H(Y|X)$, however, for some values of $X$ and $Y$, I have probabilities which are zero. I understand that for calculating entropy such as $H(X) = \sum p(x)log(p(x))$, if $p(x) = 0$ for some term, that term can then be assumed to be $0$. However, I am unsure how to deal with this in the conditional entropy case. For example, let us say I have a term from calculating $H(Y|X)$:
$$p(x_i,y_i)log(\frac{p(x_i,y_i)}{p(x_i)})$$
Firstly: What if $p(x_i,y_i) = 0$?
Secondly: What if $p(x_i) = 0$?
Thirdly: What if both are $0$?
I've attempted to add some sort of smoothing (such at +1 on denominator), but I don't know of any good justification for doing so. Is there a standard way to deal with this? Thank you.
The standard way is to set the corresponding entropy term to $0$ if $p(x_i,y_i) = 0$. The justification for this is that $\lim_{x\to 0} x\log x = 0$. This applies to all the three cases that you mention.