I have searched around for quite some time and I am still struggeling to wrap my head around a problem. My goal is to calculate the joint entropy of 32 random variables. These represent the 32 bits in an instruction word. The goal is to calculate the total entropy in all those 32 bits, I have the probability of each bit being high and also the the conditional probability of a bit being high dependent on all other bits.
This is stored as a matrix where each column represent a bit position and each row represent the bit position is it dependent on.
I.e. mat(0,0) = probability of bit0=1
mat(0,21) = probability of bit21=1 given that bit0=1
mat(16,32) = prob of bit32=1 given bit16=1
Now for simplicity lets try to calculate the entropy of bit0 and bit1 (X1 and X2)
$$H(X1,X2) = H(X1) + H(X2|X1)$$
Where $$H(X1) = \sum_{x1}{-p(x1) * \log{p(x1)}}$$ $$H(X2|X1) = \sum_{x1}{p(x1)\sum_{x2}{-p(x2|x1) * log(p(x2|x1)}}$$
This is all good an well. Because p(x2|x1) can be read directly out of my probability matrix. What I dont understand is how to move beyond this.
$$H(X1,X2,X3) = H(X1) + H(X2|X1) + H(X3|X1,X2)$$ How do I calculate the last expression? If I try to decompose it to an expression for with probabilities I get
$$H(X3|X1,X2) = \sum_{x1,x2}{p(x1,x2)\sum_{x3}{-p(x3|x1,x2) * log(p(x3|x1,x2)}}$$
And I am really unsure about $p(x3|x2,x1)$. I can do
$$p(x3|x2,x1) = \frac{p(x3,x2,x1)}{p(x3)}$$
p(x3) is ok. But how do I get the joint probability? When they are all dependent on each other?
I have the following probability matrix:
p(x1) p(x2|x1) p(x3|x1)
p(x1|x2) p(x2) p(x3|x2)
p(x1|x3) p(x2|x3) p(x3)
Is there any way for me to calculate p(x1,x2,x3) from this?
Consider two examples,
i) $x_1\sim Bernoulli(1/2), x_2\sim Bernoulli(1/2), x_3\sim Bernoulli(1/2)$ are independent.
ii) Same marginals, $x_1, x_2$ are independent, but $x_3 = x_1 ~xor~ x_2$.
Both seem to have the same,
p(x1) p(x2|x1) p(x3|x1)
p(x1|x2) p(x2) p(x3|x2)
p(x1|x3) p(x2|x3) p(x3)
if I'm not mistaken.
In the first case $p(x_3|x_1, x_2) = p(x_3)$. However, in the second case, $p(x_3|x_1, x_2) = 1_{[x_3 = x_1~xor~x_2]}$. So it seems like the given probability matrix is not enough.