Campbell's Source coding

130 Views Asked by At

In the usual Shannon's source coding problem one chooses code words that minimize $E[L]:=\sum_i p_il_i$ over all $L=(l_1,l_2, \dots), l_i\ge 0$ such that $\sum_i e^{-l_i}\le 1$ (Kraft inequality), where $p_i$ are the probabilities of the letters.

The answer to this problem is $l_i=-\log p_i$ for all $i$.

Campbell generalized this to a problem where one minimizes $C[L]:=\frac{1}{t}\log\sum_i p_i e^{t l_i},t>0$, instead of the expected length, subject to the same Kraft inequality. Remember that the $l_i$'s which minimize $C[L]$ are the same as the ones which minimize just $\sum_i p_i e^{t l_i}$.

The solution to this problem is $l_i = -\log(p_i^{\alpha}/\sum_jp_j^{\alpha})$. Although this solution is completely different from that of the usual source coding theorem, I heuristically feel why don't they have the same solution. After all, the only difference is that in the second case, the lengths are just exponentiated in the objective function. So, I somehow feel that the $l_i$'s which minimize $\sum_i p_il_i$ should also minimize $\sum_i p_ie^{t l_i}$ (also since the set of all possible $L$'s for both the problems are the same).

I know that my heuristic is wrong. I know that the solutions to both problems are different. Can someone enlighten me?

2

There are 2 best solutions below

1
On

Well, you "feel" that, given $x_i^*$ that minimizes $\sum p_i x_i$ subject to some restriction (in our case, $x_i\ge 0$ and $\sum 2^{-x_i}\le 1$ - but that's not essential) then the same $x_i^*$ minimizes $\sum p_i g(x_i)$, where $g(\cdot)$ is any non-decreasing function. That's obviously false.

An example: $p_i=1$, and the restrictions $x_i\ge 0$, $\sum x_i^2 = 1$. Then the objective $\sum x_i$ is minimized by $x=(1,0\cdots)$, while the alternative objective $\sum x_i^3$ is minimized by uniform $x_i$.

2
On

The following answer of mine might probably not be able to provide a rigorous justification for why the solutions to the two optimization problems are different, but Campbell provides interesting differential geometric viewpoints to the same, which I present below:

If we consider the initial value problem \begin{align} \frac{d}{du}x_{i}(u)&=\left(l_{i}-\sum\limits_{j=1}^{n}x_{j}(u)l_{j}\right)x_{i}(u),~1\leq i\leq n,~u\geq 0\\ x_{i}(0)&=x_{i}^{0}, \end{align} then the solution to this equation is of the form \begin{eqnarray} x_{i}(u)=\frac{x_{i}^{0}e^{l_{i}u}}{\sum\limits_{j=1}^{n}x_{j}^{0}e^{l_{j}u}},~u\geq 0. \end{eqnarray}

Thus, if we start at the point $q=[q_{1},\ldots,q_{n}]^{T}$ on the probability simplex, then we will remain on the simplex for all time $u\geq 0$. In the above equation, by setting $l_{i}=\log\left(p_{i}/q_{i}\right)$, we get \begin{eqnarray} x_{i}(u)=\frac{q_{i}^{1-u}p_{i}^{u}}{\sum\limits_{j=1}^{n}q_{j}^{1-u}p_{j}^{u}},~u\geq 0. \end{eqnarray} We can interpret the above equation as the curve interpolating between $q=[q_{1},\ldots,q_{n}]^{T}$ and $p=[p_{1},\ldots,p_{n}]^{T}$ for all values of $u\in [0,1]$.

In this setting, minimising $\sum\limits_{i=1}^{n} p_{i}l_{i}$ corresponds to being at $p=[p_{1},\ldots,p_{n}]^{T}$ on the curve $x(u)$ at time $u=1$, and minimising $\frac{1}{t}\log\left(\sum\limits_{i=1}^{n}p_{i}e^{tl_{i}}\right)$ corresponds to being at the point on the curve $x(u)$ for $u=\frac{1}{t+1}$.