The Entropy of the quantization symbols and the smallest number of bits that is required for representing source symbols.

140 Views Asked by At

Considering $N_s$ source symbols $v$ with PDF given as,

$p(v)= e^{-v}I_{[0,\infty]}(v)$

The $k$-bits quantizer $Q$ maps the $N_s$ source symbols $v$ to symbols $s$, $s \in \{0,..., 2^k-1\}$ The quantization interval is equally spaced at $ln(2)\cdot s$.

I tried to find the entropy of the quantization symbols s given k as following,

$H_k= -\sum_{s=0}^{2^k-1}p(s)log_2(p(s))$

I defined $p(s) = 0.5 e^{-ln(2)s}$ for $s \in \{ 0,...,2^k-2\}$ and $p(s) = e^{-ln(2)s}$ for $s = 2^k-1$

$H_k= -\sum_{s=0}^{2^k-2} 0.5 e^{-ln(2)s}log_2( 0.5 e^{-ln(2)s}) + e^{-ln(2)2^k-1}log_2(e^{-ln(2)2^k-1})$

then I plot $H_k$ as a function of $k \in[1,100]$ and got this image of H_k the entropy plot shows that $H_k$ become constant = 2 efter around k=10. How do I interpret this result. Why do the entropy become a constant, is the entropy the lower bound of bits that is required for representing source symbols? does this means that as we increasing number of k it does'nt effeckt the redundancy? or did i make any mistake in my calculation?

2

There are 2 best solutions below

0
On

Perhaps you are overthinking things. Do you understand how entropy is calculated in general? Do you understand that if all of the symbols are equally probable then the binary entropy simplifies to $\log(w)/\log(2)$, where $w$ is the distinct symbol count?

Here is a code:

#include <iostream>
#include <map>
#include <string>
#include <cmath>
using namespace std;

int main(void)
{
    string input_string = "Hello world!!";
    size_t length = input_string.length();

    double entropy = 0.0;

    if (length > 0)
    {
        map<char, size_t> input_map;

        for (size_t i = 0; i < length; i++)
            input_map[input_string[i]]++;

        for (map<char, size_t>::const_iterator ci = input_map.begin(); ci != input_map.end(); ci++)
        {
            double probability = ci->second / static_cast<double>(length);

            cout << "Character: \'" << ci->first << "\', Count: " << ci->second << ", Probability: " << probability << endl;

            entropy += probability * log(probability);
        }

        entropy = -entropy;
    }

    cout << "Binary entropy: " << entropy/log(2.0) << endl;

    return 0;
}
0
On

The probability mass funcion of your "quantized" variable is $$p(s)= \frac12 \left(\frac12\right)^s $$ for $s=0,1, \cdots 2^k-2$ , and the reminder concentrated on the last value (so that the total sum is one).

Hence, what you have is not really a quantization of the (continuous) exponential distribution (traditionally, the quantization of a continuous variable becomes more and more fine-grained as the number of discrete values increases). You are just computing the entropy of a (discrete) geometric random variable with $p=\frac12$ - except that its tail is concentrated at $s=2^k-1$.

For $k\to \infty$ this tends to a true Geometric distribution, for which the entropy is $2$ bits.

In terms of the "smallest number of bits that is required" (in average), this can be thought as the following yes-no questions strategy: Ask first "Is $S=0$?" If yes, (which happens with prob $1/2$), then we've needed just one guess. Elsewhere, ask: "Is $S=1$?". Etc. Then the number of questions (in term of coding, bits) needed for each value is

$$ \begin{array}{c|l|c|c} S & C & \ell & p \\ \hline 0 & 1 & 1 & 1/2 \\ 1 & 01 & 2 & 1/4 \\ 2 & 001 & 3 & 1/8 \\ 3 & 0001 &4 & 1/16 \\ \cdots & \cdots & \cdots \end{array} $$

and the average is $\sum_s \ell_s p_s =2$