What's this compression technique called?

Question

What's this compression technique called?

67 Views Asked by Bumbble Comm At 09 Apr 2026 - 1:38

Consider this string of 1's, 0's (spaces added for readability):

1010 1010 1010 1010

We can describe that as:

Starts with 1
16 bits
Does not contain:
- 00
- 11

In general, we can describe any finite string of bits with its length, start bit and listing enough finite substrings that are not contained in the string.

We can brute force the minimum description. (The list of substrings that has minimum overall length)

What's this compression technique called?

String could be "approximated" by a substring list that can derive multiple strings.

"Similar" strings may have substrings in common.

"Complicated strings" maybe have many substrings. Or high minimum substring length.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2020-05-10 08:06:32

As given, your algorithm is neither precise enough nor efficient. However, there is an efficient compression technique using forbidden words, called DCA compression, or Data Compression using Antidictionaries, see [1].

[1] M. Crochemore, F. Mignosi, A. Restivo, S. Salemi, Text compression using antidictionaries, Automata, languages and programming (Prague, 1999), 261--270, Lecture Notes in Comput. Sci., 1644, Springer, Berlin, 1999.

EDIT (Answer to the comment). Your suggested method would not compress well. For instance, in your example, you would need to transmit:

the first letter $w_1$ of $w$ (1 bit),
the length of the word (here $16$ or $10000$ in binary, that is $5$ bits),
the words $01$ and $10$ (4 bits).
You also need separators to be able to parse the data in 2 and 3, a nontrivial issue. Even if you assume that the length of $w$ is bounded by $31$, so that $5$ bits will always suffice for the length, how do you code the set $\{01, 10\}$ efficiently?

In your case, a coding by repetition would be more efficient. Your word is $(10)^8$, which you could code as follows: first use $\color{red}5$ bits to code $8$ (assuming again that the length of $w$ is bounded by $31$), and then write $\color{blue}{10}$. Altogether, you would get $\color{red}{01000}\color{blue}{10}$ (7 bits), a much better result than your suggestion.

Furthermore, your technique might be difficult to implement for complexity issues. Formally, you suggest to find a set of words $v_1, \ldots, v_n$ such that $$ w_1\bigl(\{0,1\}^* - \{0,1\}^*(v_1 + \dotsm + v_n)\{0,1\}^*\bigr) \cap \{0,1\}^{|w|} = \{w\} $$ The complexity ( = the number of states of the minimal deterministic automaton) of a language of the form $\{0,1\}^*v\{0,1\}^*$ is $|v| + 1$, but the complexity of $\{0,1\}^*(v_1 + \dotsm + v_n)\{0,1\}^*$ might be $O(|v_1] \dotsm |v_n|)$ (See this question for a similar issue). Thus the decoding might be in $O(2^n)$.

The coding might also be difficult, as I don't see any efficient algorithm to find the set of forbidden factors $\{v_1, \dotsm, v_n\}$.

What's this compression technique called?

There are 1 best solutions below

Related Questions in TERMINOLOGY

Related Questions in BINARY

Related Questions in BIT-STRINGS

Related Questions in COMPRESSION

Trending Questions

Popular # Hahtags

Popular Questions