Is there an Intuitive Way to Remember Chain Rules or Entropy with 3+ variables

86 Views Asked by At

Im thinking that the chain rule could be more easily derived if forgotten if there was a hierarchy to how we interpret the separation of the variables but I want to check the validity of the thoughts:

Consider first the two variable cases

$$H(X|Y)=H(X,Y)-H(Y)$$ which can be made analogous to three variables in the following way: $$H(X|Y,Z)=H(X,Y,Z)-H(Y,Z)$$ by thinking of A=(Y,Z) and simply considering H(X|A). From this we suggest that when considering these variables we can read a series of joints as the leftmost variable intersect all the others to the right.

Now $$H(X,Y)=H(X|Y)+H(Y)$$ If we use the definition from above we could write that: $$H(X,Y,Z)=H(X|Y,Z)+H(Y,Z)$$ following the exact pattern above by using H(X,A) where A=(Y,Z). But lets rearrange and see what happens:

Consider now, H(X,Y|Z) which is different in that the rightmost element is now a condition. This is where the "hierarchy" becomes relevant. I feel that you could think of this group in two ways 1) The entropy of the the intersection of (X and Y) given Z, or 2) the entropy of X intersected with (Y given Z). Not only is the first more consistent with the methods we have just used earlier successfully, but the second runs into issues with not actually being a random variable.

Testing the theory:

$$H(X,Y|Z)=H(A|Z)=H(A,Z)-H(Z)=H(X,Y,Z)-H(Z)$$ which is true as can be verified here. So, in short, for those who don't "get" the short hand for the chain rule, we usually resolve it by conjoining the variables which are joint and then separating those which are conditioned on. You can then simplify the cases down to the typical two variable case and then substitute back in the end. Ive simply been deriving/memorizing the formulas when I need them but is this shorthand a valid interpretation?

TLDR; $$H(X,Y|Z)->H(A|Z); A = (X,Y)$$ $$H(X|Y,Z)->H(X|A); A = (Y,Z)$$ $$H(X,Y,Z)->H(X,A); A = (Y,Z)$$