Conditional Entropy: if $H[y|x]=0$, then there exists a function $g$ such that $y=g(x)$

4k Views Asked by At

Suppose that the conditional entropy $H[Y|X]$ between two discrete random variables $x$ and $y$ is zero.

Show that, for all values of $x$ such that $p(x) > 0$, the variable $y$ must be a function of $x$, in other words, for each $x$ there is only one value of $y$ such that $p(y|x)\ne 0$

Therefore, if $H[y|x]=0$, then there exists a function $g$ such that $y=g(x)$.

Entropy Venn Diagram

Would it be sufficient to prove this by looking at Venn Diagram and noticing that if $H[Y|X]=0$, then:

$$H[y] = H[x] - H[x|y]$$

Therefore, $H[y|x]=0$ if and only if the value of $y$ is completely determined by the value of $x$.

I want to say that $H[x|y]$ depends only on $x$ variable and not on variable $y$... but not sure how to prove it.

I guess, the question is: how to make such a bold statement like above. iff $H[y|x]=0$, $y$ is a function of $x$!

2

There are 2 best solutions below

0
On

Based on the definition of Entropy:

$$H[y|x] = - \sum_{x_i} \sum_{y_j} p(x_i, y_j) \ln p(y_j | x_i)$$

Considering the property of probability, we can obtain that:

$$0 \le p(y_j|x_i) \le 1$$

$$0 \le p(x_i, y_j) \le 1$$

Therefore, we can see that:

$$-p(x_i, x_j) \ln p(y_j | x_i) \ge 0$$

when:

$$0 < p(y_j|x_i) \le 1$$

y=ln(x)


when:

$$p(y_j|x_i) = 0$$

provided with the fact that:

$$\lim_{p \to 0} p \ln p = 0$$

we can see that:

$$-p(x_i,y_j) \ln p(y_j | x_i) = -p(x_i) p(y_j|x_i) \ln p(y_j|x_i) = 0$$

(here we view p(x) as a constant).

Hence for an arbitrary term in the equation above, we have proven that it cannot be less than 0. In other words, if and only if every term of $H[y_j|x_i]$ equals 0, $H[\mathbf{y}|\mathbf{x}]$ will equal 0.

Therefore, for each possible value of random varuable x, denoted as $x_i$:

$$-\sum_{y_i} p(x_i, y_i) \ln p(y_i|x_i) = 0$$

If there are more than one possible value of random variable "y given $x_i$", $p(y_i|x_i)$, such that:

$$p(y_i|x_i) \ne 0$$

(Because $x_i$, $y_i$ are both "possible", $p(x_i, y_i)$ will also not equal to 0), constrained by:

$$0 \le p(y_i|x_i) \le 1$$

and

$$\sum_j p(y_i|x_i)=1$$

there should be at least two values of y that satisfy:

$$0 < p(y_j|x_i) < 1$$

which ultimately leads to:

$$-\sum_{y_i} p(x_i, y_i) \ln p(y_i|x_i) > 0$$

Therefore, for each possible value of x, there will only be one y such that:

$$p(y|x) \ne 0$$

In other words, y is determined by x.

Note: If y is a function of x, we can obtain the value of y as soon as observing a x. Therefore we will obtain no additional information when observing a $y_j$ given an already observed x.

4
On

Write $\mathcal{X}$ and $\mathcal{Y}$ for the range $X$ and $Y$, respectively. Write $H[Y|X]$ as

$$ H[Y|X] = \sum_{x\in\mathcal{X}} \sum_{y\in\mathcal{Y}} \left( -p(x) p(y|x) \log p(y | x) \right) \tag{1}$$

Here, $0 \log 0$ is regarded as $0$ as usual. Now assume that $H[Y|X] = 0$. Since each summand of $\text{(1)}$ is non-negative, this implies that each summand vanishes, and so, we get

$$ p(y|x) \log p(y | x) = 0 \quad \text{whenever} \quad p(x) > 0. $$

But since the function $t \mapsto t \log t$ on $[0, 1]$ has exactly two zeros $t = 0$ and $t =1$, this implies that

$$ p(y|x) \in \{0, 1\} \quad \text{whenever} \quad p(x) > 0. $$

Still assuming that $x$ satisfies $p(x) > 0$, the identity $\sum_{y\in\mathcal{Y}} p(y|x) = 1$ ensures that there exists a unique $y \in \mathcal{Y}$ such that $p(y|x) = 1$ holds. In such case, we write $y = g(x)$. We also remark that $p(y'|x) = 0$ if $y' \neq g(x)$.

To extend $g$ to a function on all of $\mathcal{X}$, assign to $g(x)$ an arbitrary value in $\mathcal{Y}$ whenever $p(x) = 0$. (Such assignment will never harm the argument.) This yields a function $g : \mathcal{X} \to \mathcal{Y}$. We claim that

Ciaim. $Y = g(X)$ with probability one.

Indeed, recall that $g$ is defined so as to satisfy

$$ p(y|x) = \mathbf{1}_{\{y = g(x)\}} := \begin{cases} 1, & \text{if $y = g(x)$} \\ 0, & \text{if $y \neq g(x)$} \end{cases} \tag{2} $$

whenever $p(x) > 0$. (Here, indicator function notation is used.) From this,

\begin{align*} \mathbb{P}(Y = g(X)) &= \mathbb{E}[\mathbf{1}_{\{Y = g(X)\}}] \\ &= \sum_{x\in\mathcal{X}} \sum_{y\in\mathcal{Y}} p(x,y) \mathbf{1}_{\{y = g(x)\}} \\ &= \sum_{x\in\mathcal{X}} p(x)\sum_{y\in\mathcal{Y}} p(y|x) \mathbf{1}_{\{y = g(x)\}} \\ &\stackrel{\text{(2)}}= \sum_{x\in\mathcal{X}} p(x)\sum_{y\in\mathcal{Y}} p(y|x) \\ &= \sum_{x\in\mathcal{X}} \sum_{y\in\mathcal{Y}} p(x, y) \\ &= 1. \end{align*}