Fano's inequality explained intuitively?

4.3k Views Asked by At

I am now reading through a book to understand Fano's inequality, but I remember my professor explaining it in a certain way that made it seem so logical.

I will go office hours as soon as possible, but for now can someone please try to explain to me Fano's inequality but not through math just in a "logical" way that makes sense.

Thanks a lot!!

2

There are 2 best solutions below

0
On BEST ANSWER

Suppose that we wish to estimate a random variable $X$, by an estimator $\hat{X}$ . Further more assume that $\mathbb{P}(\hat{X} \neq X) = \epsilon$. The question is what can we say about the conditional entropy $H(X\mid\hat{X})$. Intuitively, if $\epsilon$ is very small, then $H(X\mid\hat{X})$ should also be very small. Fano's inequality quantifies this intuition $$ H(X\mid\hat{X})\leq H\left(\epsilon\right)+\epsilon\log\left(|\mathcal{X}\right|-1) $$ where $\mathcal{X}$ is the alphabet of $X$, and $|\mathcal{X}|$ is the cardinality of $\mathcal{X}$.

You can find more information in this book.

0
On

I encourage you to think in terms of how many bits you need to transmit in order to know $X$ (a random variable) from $\hat{X}$ (an estimate).

$H(X|\hat{X})$ is the average number of bits we need to transmit so the receiver can know $X$ with the knowledge of $\hat{X}$. We want to upper bound this.

Imagine we use up some bits in communicating if $X$ is $\hat{X}$ or not. The distribution for this is $\{P_e,1-P_e\}$, i.e we need to transmit $H(P_e)$ bits on an average to do this.

If it's NOT $\hat{X}$, then it could be any one of the other $|\mathcal{X}|-1$ symbols in the alphabet. The WORST case length is now $\log(|\mathcal{X}|-1)$ and this happens with a probability $P_e$.

So we have, $$H(X|X') \le H(P_e)+P_e\log(|X|-1) \le H(P_e)+P_e\log(|X|) $$