I am trying to build intuition behind various entropies depicted in below diagram:
I am trying to compare it with probability distribution venn diagram:

Also I am trying to keep definitions in consideration while understanding I-diagram:
- The entropy of a random variable is the average level of "information"inherent in the variable's possible outcomes:
$$H(X)=E[I(X)]=E[−\log(P(X))]=-\sum_{i=1}^nP(x_i)\log{P(x_i)}$$
- The conditional entropy quantifies the amount of information needed to describe the outcome of a random variable $Y$ given that the value of another random variable $X$ is known.
$$H(Y|X)=-\sum_{x\in X,y\in Y}p(x,y)\log \frac{p(x,y)}{p(x)}$$
- Joint entropy is a measure of the uncertainty associated with a set of variables.
$$H(X,Y)=-\sum_{x\in X,y\in Y}P(x,y)\log_2[P(x,y)]$$
- Mutual information quantifies the "amount of information" obtained about one random variable through observing the other random variable.
$$I(X,Y)=\sum_{x\in X,y\in Y}P(x,y)\log_2\left(\frac{p(x,y)}{p(x)p(y)} \right)$$
Doubts
I am able to intuitively understand labels of probability distribution venn diagramm, but not of information diagram. I feel its possible to have intuition for probability distribution venn diagram because of their simple meaning. For example, even $P(A)$ means number of occurrences of A (out of population). But $H(X)$ is itself an expectation. Hence its impossible to have such clear cut intuition for information diagram. And we need to kind-of memorize the labels. Or is there any simpler intuitions behind information diagram labels?
How are those conditional entropy $H(X|Y)$ and $H(Y|X)$ labels came? Especially given we cant label any part of venn diagram with conditional probability.
Also how its labeled $H(X|Y)$, given that part is outside $H(Y)$?
Can we draw any mathematical intuition for informaton digram label $I(X,Y)$?
Wikipedia does not give any good "logical" definition joint entropy. Can we have one which will lead to the label $H(X,Y)$? Also is it parallel to joint distribution, especially given that we cannot label any part of venn diagram with joint distribution?

The information diagram is just a way to represent the fundamental equations $$ H(X,Y)=H(X)+H(Y|X), $$
and
$$ I(X;Y)=H(X)-H(X|Y), $$
and their versions with $X$ and $Y$ exchanged. Nothing more than that.