Entropy Equation (Information Theory)

Question

Entropy Equation (Information Theory)

126 Views Asked by Bumbble Comm At 31 Mar 2026 - 1:02

I am studying basic language models before than CRF from http://www.eng.utah.edu/~cs6961/papers/klinger-crf-intro.pdf.

But I am stuck at the page 6.

It says the conditional entropy as

$H(y|x)=-\sum_{(x,y) \in Z} p(y,x) \log p(y|x)$

Should not it be

$H(y|x)=-\sum_{(x,y) \in Z} p(y|x) \log p(y|x)$

Why is it $p(y,x)$ instead of $p(y|x)$? (https://en.wikipedia.org/wiki/Entropy_(information_theory))

Can we use the both interchangeably?

Thank you.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 22 Nov 2017 - 4:01

In practice, it is often assumed, that all input variables xi are conditionally independent of each other. That Probabilistic Models means that p(xi|y, xj ) = p(xi|y)

**Bumbble Comm** · Accepted Answer

The correct definition is the one appearing in the text. You may understand the conditional entropy as follows. Let $Y$ and $X$ be random variables, with a joint distribution $p_{X,Y}(x,y)$. Now, given $X=x_0$, $Y$ is described by the conditional distribution $p_{Y|X}(y|x_0)$, and has an entropy equal to $$ H(Y|X=x_0) = -\sum_y p_{Y|X}(y|x_0) \log p_{Y|X}(y|x_0) $$ Note that the notation $H(Y|X=x_0)$ is unconventional (although appearing every now and then). Here, it only serves to remind that we are actually computing the entropy of $Y$ using the standard definition, however, since we know that $X=x_0$, we use the conditional distribution of $Y$ given $X=x_0$ (instead of the marginal distribution $p_Y(y)$).

Now the conditional entropy of $Y$ given $X$ (not $Y$ given $X=x_0$; note the difference) is the average of $H(Y|X=x_0)$ over all possible $x_0$, that is

$$ \begin{align} H(Y|X) &= \sum_{x_0}p_X(x_0) H(Y|X=x_0) \\ &=-\sum_{x_0} \sum_y p_X(x_0)p_{Y|X}(y|x_0) \log p_{Y|X}(y|x_0) \\ &=-\sum_{x_0} \sum_y p_{Y,X}(y,x_0) \log p_{Y|X}(y|x_0), \end{align} $$ which is the same formula as the one in your text.

Entropy Equation (Information Theory)

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in ENTROPY

Trending Questions

Popular # Hahtags

Popular Questions