Entropy analysis in Laplacian pyramid

223 Views Asked by At

Update: The paper I mention doesn't need to be fully read in order to answer the question; I'm interested in an analysis of the change in the entropy in two specific processes, so only their technical descriptions are relevant (I briefly mention them below). The article may be found here.

I've recently read the paper The Laplacian Pyramid as a Compact Image Code by Burt and Adelson, and I find it difficult to fully understand how the entropy of the input image and the other layers in the Gaussian pyramid is reduced: The authors claim that two stages - the subtraction of an image and its (expanded) gaussian-blurred version, and the quantization - reduce the entropy, but a formal analysis is lacking.

  1. The authors claim that given an image $x$ (whether it is the input image or some layer in the Gaussian pyramid), if we denote its (expanded) blurred version by $\widetilde{x}$, then the "image" $x-\widetilde{x}$ (might be with negative values) has lower entropy than $x$. They write that "the technique of subtracting a predicted value from each image pixel, as in the Laplacian pyramid, removes much of the pixel-to-pixel correlation. Decorrelation also results in a concentration of pixel values around zero, and, therefore, in reduced variance and entropy". I cannot understand why this process reduces the pixel correlations, and why decorrelation implies reduced entropy.

  2. The authors claim given an image $x$, its quantized version $\widetilde{x}$ has lower entropy. They write that "entropy can be substantially reduced by quantizing the pixel values in each level of the Laplacian pyramid". I cannot understand why it is true.

Thanks in advance.

1

There are 1 best solutions below

1
On BEST ANSWER

In general, suppose we have a stationary random signal $X^n=(X_1,X_2 \cdots X_n)$ (assume the index to be the "time"), where the succesive values are not independent, and supposed we are given a (hopefully good) predictor of each value as some deterministic function (typically linear) of the previous ones $ {\hat X_i}=f_i(X_{i-1},X_{i-2} \cdots)$. Lets consider the new signal $Y^n=(Y_1,Y_2 \cdots Y_n)$ formed by the prediction errors : $Y_i = X_i - {\hat X_i}$.

How do $X^n$ and $Y^n$ compare? First notice that they are related by a one-to-one deterministic map, hence the total amount of information is the same :$H(X^n)=H(Y^n)$

However, if the predictor is reasonably good, we expect $Y^n$ to do "better" in several senses:

  1. $Y^n$ is (almost) white: The prediction errors $Y_1,Y_2, \cdots $ have lower correlation (ideally zero) than $X_1, X_2\cdots...$
  2. $Y_i$ has almost zero mean, and low variance.
  3. The "marginal" entropy is lower: $H(Y_i) < H(X_i)$

Property $2$ is plausible: if the predictor is good, then the error prediction should be (in average) small (hence the usual dispersion measures, in particular the variance, should decrease).

Property $1$ is perhaps less obvious, but, intuitively, it makes sense: when we build the prediction, we digest as information from the past as possible; what remains is the truly new information, what could not be guessed from the past (the innovation).

Property $3$ can be justified from both the previous ones. Property 2 tells us that the probability distribution of $Y_i$ will be narrow, hence its entropy should be small. And property 1 tells us that $H(Y^n) \approx H(Y_1) + H(Y_2) +\cdots = n H(Y_i) $, but this equals $H(X^n)< nH(X_i)$

All this is well known in signal processing , and is used, for example, in PNG image compression (which is based on pixel prediction).

Coming to your paper: doing $y = x - \widetilde{x}$ where $\widetilde{x}$ is a blurred image amounts to doing something like a prediction filter (a typical image has similar neighbour pixels, hence the average of the neighbours amounts to a "prediction" of the current pixel), and, indeed, we expect $y$ to have low variance and entropy.

That quantization (again, in general) reduces the entropy should be quite obvious. A quantization is a many-to-one function, hence the entropy is lower.