Given a random vector $x \in \mathbb{R}^n$ and a classifier $f(x) = \hat{y}$, my goal is to show that the entropy of $\hat{y}$ is at most equal to the entropy of the random vector $x \in \mathbb{R}^n$, ie $H(\hat{y}) \leq H(x)$.
I have begun by writing their joint entropy as both ways below
- $H(x, \hat{y}) = H(\hat{y}) + H(x | \hat{y})$
- $H(x, \hat{y}) = H(x) + H(\hat{y} | x)$
Based the property that $H(X,Y) = H(X)$ if $Y=g(X)$, I have re-written the first equation as
- $H(x) = H(\hat{y}) + H(x | \hat{y})$
and deduced that $H(\hat{y} | x) = 0$
Now, I am trying to find some kind of bound over the first equation that I could leverage to demonstrate $H(\hat{y}) \leq H(x)$. Another property of joint entropy is that $H(X) \geq H(X|Y)$, and from this we are able to determine that $H(\hat{y}) \geq 0$, but I am not sure how this could factor into the proof.
This is where I am stuck. If anyone can help me find some bound to complete the proof, it would be much appreciated.
You have (in general) :
$$ H(\hat{y}) + H(x | \hat{y}) = H(x) + H(\hat{y} |x )$$
Now, because $\hat{y} = f(x)$ , you have $H(\hat{y} |x )=0$.
Also, $H(x | \hat{y}) \ge 0 $ (any entropy is non-negative), with equality if $x = g(\hat{y})$.
Then $$ H(\hat{y}) \le H(x)$$
with equality iff the function $f$ is one-to-one.