Understanding the proof of Fano's inequality

875 Views Asked by At

If we have the markov triple

\begin{align*} X \rightarrow Y \rightarrow \hat X \end{align*}

Where $X$ is a random variable, $Y$ a random variable correlated to $X$ and $\hat X$ is our estimate of $X$.

Using Fano's inequality we are able to bound the error $P_e = \mathbb{P}[\hat X \neq X ]$ we make with respect to the conditional entropy $H(X| \hat X)$. The result wil be:

\begin{align*} P_e \ge \frac{H(X\mid Y) - 1}{log |\chi|} \end{align*}

However, a part of the proof includes to show that

\begin{align*} H(E,X \mid \hat X) &= H(X\mid \hat X) + \underbrace{H(E \mid X, \hat X)}_{= 0} \\ &= H(E\mid \hat X) + H(X \mid E, \hat X) \end{align*}

Where $E$ is a RV indicating whether $X$ is equal to $\hat X$

\begin{align*} E = \begin{cases}1 & \text{if } X \neq \hat X \\ 0 & \text{if } X = \hat X \end{cases} \end{align*}

If we look at $H(E,X \mid \hat X)$ we already see that:

\begin{align*} H(X\mid \hat X) = H(E\mid \hat X) + H(X \mid E, \hat X) \end{align*}

And since $H(E\mid X) \leq H(E) = H(P_e)$ we immediately see that

\begin{align*} H(X\mid \hat X) \leq H(P_e) + H(X \mid E, \hat X) \end{align*}

Here comes the part I do not quite understand about $H(X \mid E, \hat X)$. In the textbook it's stated that

\begin{align*} H(X\mid E, \hat X) &= \mathbb{P}[E=0]H(X \mid \hat X, E = 0) + \mathbb{P}[E=1]H(X\mid \hat X, E = 1) \\ &\leq (1 - P_e) \cdot 0 + P_e \log_2 |\chi | \end{align*}

Hence

\begin{align*} H(P_e) + P_e \log |\chi | \geq H(X\mid \hat X) \geq H(X \mid Y) \end{align*}

where the last inequality follows from the data-processing inequality.


What I am not getting is why $H(X\mid E, \hat X)$ can be written as

\begin{align*} H(X\mid E, \hat X) &= \mathbb{P}[E=0]H(X \mid \hat X, E = 0) + \mathbb{P}[E=1]H(X\mid \hat X, E = 1) \end{align*}

and also why this is limitted by $P_e \log_2 |\chi|$. The equation tells me that $H(X \mid \hat X, E = 0) = 0$ and my textbook argues that

"Since given $E = 0$, we have $X = \hat X$."

but still I don't see how this is equal to $0$.


Further, I've stated $H(E \mid X, \hat X)= 0$ and this is argued because $E$ is a function of $X$ and $\hat X$. But why is $H(X \mid E, \hat X)\neq 0$?

Intuitively I'd say $H(E \mid X, \hat X)= 0$ because once I know $X$ and $\hat X$ I also know $E$, hence there is no more information to be added. However, this is not the same for $H(X \mid E, \hat X)$ - but how can I argue this?

2

There are 2 best solutions below

0
On

What I am not getting is why $H(X\mid E, \hat X)$ can be written as \begin{align*} H(X\mid E, \hat X) &= \mathbb{P}[E=0]H(X \mid \hat X, E = 0) + \mathbb{P}[E=1]H(X\mid \hat X, E = 1) \end{align*}

It's a property (sometimes a definition) of conditional entropy: the conditional entropy equals the expected value of the entropy conditioned on particular values of the conditioning variable:

$$H( X | Y) = \sum_y P(Y=y) H(X | Y=y)$$

The formula above just applies this to the random variable $E$ (it takes two values, $0$ and $1$).


$$H(X \mid \hat X, E = 0) = 0$$

This can be seen thus: $E=0$ implies that there was no error in the estimation; hence, knowing this, and knowing the estimated value $\hat X$, we just know the value of $X$ ($X = \hat X$), so we have no incertitude about $X$, hence the entrop is zero.

I'm not sure if this answers all your doubts.

0
On

I thought two issues may confuse you, 1) one is the expansion of the conditional entropy, 2) the other is what it means to say "distribution $E$ is a function of $X$ and $\hat X$".


To illustrate the definition of Conditional Entropy in detail, I'd like to show you a simple and concrete example. On page 30 of the Elements of Information Theory there is an example: enter image description here
In the calculation bellow the author calculates $H(X|Y)$ as the following:

$H(X|Y)=\frac{3}{4}H(X|Y=1)+\frac{1}{4}H(X|Y=2)$

That's to say: $H(X|Y)=Pr(Y=1)H(X|Y=1)+Pr(Y=2)H(X|Y=2)$

And from Wikipedia you will check the definition of the conditional entropy:

$H(Y|X)=\sum_{x\in\mathcal{X}}p(x)H(Y|X=x) $

The same apllies to this equation which you didn't get:

\begin{align*} H(X\mid E, \hat X) &= \mathbb{P}[E=0]H(X \mid \hat X, E = 0) + \mathbb{P}[E=1]H(X\mid \hat X, E = 1) \end{align*}


Secondly, the saying that "distribution $E$ is a function of $X$ and $\hat X$" refers to this: enter image description here That's what E is in the proof. Obviously $E$ is a function of $X$ and $\hat X$. Given any fixed $X$ and $\hat X$ the $E$ is determined, hence the conditional entropy $H(E|X,\hat X)$ is 0.

Given that $E=0$ which means $X=\hat X$(according to the difinition of $E$ above), the conditional entropy,

\begin{align*} H(X, |\hat X, E=0)=H(X, \hat X|E=0)-H(\hat X|E=0)\overset{X=\hat X}{=}H(X) - H(X)=0 \end{align*}

However for $H(X \mid E=1, \hat X)$ it is, \begin{align*} H(X \mid E=1, \hat X)=H(X, \hat X|E=1)-H(\hat X|E=1)\overset{X\neq\hat X}=H(X, \hat X) - H(\hat X) \neq 0 \end{align*}