Sorted byte arrays with unique values - best possbile compression

290 Views Asked by At

I have byte arrays with following constraints:

  • Length between 1 and 256
  • Length median about 128, but I have to verify this on larger dataset
  • Values are sorted ascending
  • Values are unique

I am looking for algorithm for best compression of this data. Maximal uncompressed size for array if it is full is 256B. For median it is 128B.

For now best compression I know is to use bit-field to store if byte is in array or not, and I can omit trailing zeros. So for one array i will use ceiling("max value" / 8) B. For full array (or array containing 248) this will be 32B.

This will reduce size in general, but it is bad for sparse arrays with length < 32. I can use flag to store data compressed or uncompressed if it turns out that uncompressed array is smaller than compressed.

Is there any other trick/optimization/compression i can use to reduce size even more?

Is there any way to calculate theoretical boundary how much it is possible/not possible to compress this class of data so that I can measure quality of my approaches?

One short example of data to eliminate misunderstandings, please note that this array is shorter than expected array in data: { 0, 1, 5, 7, 88, 105, 233, 234, 235, 255 }