Understanding the conditional entropy derivation

192 Views Asked by At

This lecture on slide 5, has the following derivation which I don't understand their notation changes:

let $(X,Y) \sim p$.

For $x\in Supp(X)$ the random variable $Y|X=x$ is well defined.

Q1: Why is the above relevant?

Continues... let $p_x$ & $p_{Y|X}$ be marginals induced by p.

$$H(Y|X) = \sum_{x \in X}p_x(x) H(Y|X=x)$$

Q2: isn't the $p_x(x)$ notation redundant? could it just be left as $p(x)$?

$$=-\sum_{x \in X} p_x(x) \sum_{y \in Y} p_{Y|X}(Y|X) log p_{Y|X}(y|x)$$

Q3: same question... isn't $p_{Y|X}$ redundant also? ie: could just be $p(Y|X)$

$$=-\sum_{x \in X, y \in Y} p(x,y)log p_{Y|X}(y|x) \\ = -E_{(X,Y)} log p_{Y|X}(y|x)$$

Q4: How can they rewrite $-\sum_{x \in X, y \in Y} p(x,y)$ as $E_{(X,Y)}$? Doesn't expectation require something like: $-\sum_{x \in X, y \in Y} p(x,y)f(x,y)$? (ie: a probability times a function)?

1

There are 1 best solutions below

0
On

Q1: Why is the above relevant?

Not very relevant. If you are given a joint discrete probability function, that is always the case.

Q2: isn't the $p_x(x)$ notation redundant? could it just be left as $p(x)$?

No, if we want to be precise and rigorous. True, we often write $p(x)$ and $p(y)$ to denote the (marginal) probabilities of $X$ and $Y$, but that's sloppy notation. See, in math, when we write $f(x)$ the function is identified by the letter $f$ , $x$ is just a dumb variable (and so, if $f(x)=x^2$ then $f(y) = y^2$ and $f(x+1) =(x+1)^2$ so on: if we reuse $f()$, then we understand that it's the same function , it's only its argument what changes; if we want to write a different function, we use another letter, say $g()$ ). Now, in my example above, we'd like to keep using the letter $p$ (or $f$ if we speak of densities), then to denote that they are different functions we write a subscript, so that $p_X()$ and $p_Y()$ (read them as a double letter) are different functions.

True, $p_X()$ will normallly be evaluated in values of the random variable $X$ (which we denote conventionally by $x$). But that's not necessary, nor enough to identify the functions. Suppose I ask you: write down the probability of $X$ taking the value $x=3$: you would write $p(x)$ evaluated at $3$, which you must write $p(3)$. But then you cannot distinguish it with the probability of $Y$ taking the value $y=3$. With the correct notation, there is no confussion: $p_X(3)$ and $p_Y(3)$.

See for example here.

Q3: same question... isn't $p_{Y|X}$ redundant also? ie: could just be $p(Y|X)$

Same answer. See also here.

Q4: How can they rewrite $-\sum_{x \in X, y \in Y} p(x,y)$ as $E_{(X,Y)}$? Doesn't expectation require something like: $-\sum_{x \in X, y \in Y} p(x,y)f(x,y)$? (ie: a probability times a function)?

Exactly. The expectation of something is the summation of the probability multiplied by that something. In one variable: $E[g(X)] = \sum_x g(x) p_X(x) $

Hence

$$ E [ -\log(p_X(X))] = \sum_x \underbrace{(- \log(p_X(x)))}_\text{g(x)} \, p_X(x)=- \sum_x \log(p_X(x)) p_X(x)$$

The formula you wrote is just the same in two variables.