Formula for discrete joint probabilities:
$$ H(X, Y)=-\sum_{i=1}^n \sum_{j=1}^m p\left(x_i, y_j\right) \log p\left(x_i, y_j\right) $$
I don't really see how the above formula works? How does it also take into account the H(Y|X) on the image below? Isn't the formula just checking that $$ x_i \text { for } i=1, \ldots, n \text { and } y_j \text { for } j=1, \ldots, m $$ occur at the same time? But the part on the right of Y doesn’t occur together with any x, so how does the formula also count that part? I'm really confused.
