We all know the usual Venn diagram for mutual information:
source Wikipedia.
The visualization and description of joint entropy H(X,Y) make it appear as though it is the same as mutual information I(X;Y), which of course it is not.
For example, we know that joint probability is the intersection of two events, and is where both circles overlap each other.
But mutual information is also where both circles overlap each other.
Wikipedia says the joint entropy is "the area contained by both circles." But this is also the mutual information.
Why is joint entropy and mutual information being depicted the same way?


No. The joint entropy is the area enclosed by the union of both circles. The mutual information is the intersection.
This alternative diagram (from MacKay's Information Theory, Inference, and Learning Algorithms) can help: