Warm-up problem (with solution): Suppose I have a measuring device that spits out a stream of real numbers, which I will model as i.i.d. random variables with range $\mathbb{R}$ and continuous probability distribution $P$. I wish to record a large number of measurements in a digital form, and I am prepared to tolerate some loss of precision to make the recording compact. Let us suppose that the appropriate definition of the amount of loss is the mean squared difference between the measurements and the recording.
I need to solve two sub-problems:
- First, choose a suitable encoding function $e:\mathbb{R}\mapsto\mathbb{N}$ and a decoding function $d:\text{range}(e)\mapsto\mathbb{R}$.
- Second, choose a lossless compression algorithm that efficiently represents the stream of elements of $\mathbb{N}$ using a stream of binary digits.
I would like to focus on the first sub-problem only: the quantisation scheme. Let us just assume that the second sub-problem has an easy near-optimal solution.
It is common engineering practice to choose $d(n) = nq$ for some "quantum" $q$ which is an adjustable parameter that controls the quality of the recording. For example, JPEG uses this scheme. Let us call such a scheme "uniform", since it is appropriate if $P$ is uniform (on some interval much larger than $q$). However, for typical, highly non-uniform $P$ (as in JPEG!), this scheme fails at least one of the following sanity conditions:$\newcommand{\argmin}{\operatorname{argmin}}$
$$d(n) = \mathbb{E}_P(x | e(x)=n)$$ $$e(x) = \argmin_n |d(n)-x|$$
Here I have used the assumption that loss is measured in a mean squared sense. The first condition minimises the loss with respect to $d$ for fixed $e$, and the second minimises the loss with respect to $e$ for fixed $d$. If one of these is not satisfied, then the quantisation scheme can be improved, i.e. the loss can be reduced and/or the compression improved.
Definition: Let us say that the quantisation scheme $(e, d)$ is "sane" if it satisfies the above two conditions.
Given $P$, it is possible to compute a sane quantisation scheme by using the sanity conditions as an iterative algorithm. This is similar to the well-known K-means clustering algorithm.
Update 2013-11-18: A commenter has identified this as Lloyd's Algorthm.
Side-problem: If we suppose that $e$ is an increasing function onto $[0, N)$, then I think $N$ often uniquely determines the scheme, i.e. the starting point of the iteration doesn't matter. This is true for the uniform distribution, for example. Also, I don't think $e$ can ever have an infinite range. If both properties hold, then the sane schemes form a one-parameter family, in which the parameter $N$ controls the amount of loss. I would be interested to discover necessary or sufficient conditions for these properties to hold.
Actual problem: Now consider the analogous problem but for a measuring device that spits out elements of, say, $\mathbb{R}^{16}$.
It is common engineering practice (e.g. JPEG) to quantise each of the components separately. However, a better analogue to the "uniform" quantisation scheme is to view it as a sphere packing problem. Define $e(\vec x)$ to be the nearest point to $\vec x$ in some sphere-packing lattice, e.g. http://en.wikipedia.org/wiki/Barnes%E2%80%93Wall_lattice . However, the uniform scheme is again sub-optimal for typical real-world $P$.
We can define a "sane" quantisation scheme in exactly the same way as for the 1-dimensional case. In principle, the iterative approach to computing a sane lattice will still work. However, in practice there are just too many points. And if there aren't, I can just increase the $16$ until there are.
So instead of considering arbitrary $d$, we must look for $d$ with some sort of structure that allows it to be represented more compactly, e.g. as a distortion of a well-known lattice or something. But it feels a bit like a sphere-packing problem in a curved space, so it is not clear that distorting the uniform quantisation scheme is the right thing to do.
What would you suggest?
For an easier sub-problem, suppose $P$ is radially symmetric. This is not typical of real-world data, but it would be helpful to see a good solution to this case, as I think it includes an interesting part of the problem.