Anyone can conceptually explain what the difference is between $I(X;Y;Z)$ and $I(X,Y;Z)$?
where $I(X;Y;Z)=I(X;Z)+I(Y;Z/X)$
Basically, what the semicolon and coma mean in mutual information? In probability, coma basically means joint, or "and" simply.
Thanks.
Basically $I(X,Y;Z)$ means the mutual information between $(X,Y)$ and $Z$ which can be written as follows: $$ I(X,Y;Z)=H(Z)-H(Z|XY). $$ The value tries to quantify intuitively the amount of information we know about $Z$ by knowing $X$ and $Y$. For example if $Z=f(X,Y)$ then we expect that $Z$ can be fully known by knowing $X,Y$. Therefore the mutual information in this case is the information in $Z$ namely $H(Z)$. The mutual information with one semicolon represents the amount of information that can be know about a set of random variables at one side of semicolon by knowing the set of random variables at the other side.
Therefore $I(X;Y)$ is simply the information common to $X$ and $Y$. Now if we are interested in knowing the common information of $X,Y,Z$ then we use $I(X;Y;Z)$ which is defined as follows: $$ I(X;Y;Z)=I(X;Y)-I(X;Y|Z) $$ The term $I(X;Y|Z)$ can be interpreted as the information which is common between $X$ and $Y$ beyond $Z$, i.e. the information in $X,Y$ but not in $Z$. If we subtract this value from the common information of $X$ and $Y$, we expect to get something which is in all $X,Y,Z$. This is different from $I(X,Y;Z)$ where we were interested in the common information of $Z$ and the pair $(X,Y)$ jointly.
Set theoretic analogy should be used with care in information theory however we may use it to gain some intuition. $I(X,Y;Z)$ is more like $(X\cup Y)\cap Z$ and $I(X;Y;Z)$ is more like $X\cap Y\cap Z$.