I'm trying to normalize the Lempel Ziv complexity measure, but I can't get it right.
I'm following the Lempel and Ziv (1976) publication where they show a theorem which defines the upper bound of LZC defined as:
$$ C(S) < \frac{N}{(1-\varepsilon_{n})\;log_\alpha(N)} $$
where
$$ \varepsilon_{n} = 2 \frac{1 + log_\alpha\;log_\alpha(\alpha N)}{log_\alpha(N)} $$
Here $N$ is the length of the sequence in study, and $\alpha$ is the number of symbols the sequence is composed of.
I think I understand this. But let's consider the binary sequence S = "0001101001000101", which has a complexity of $C(S) = 6$. The length $N$ of this sequence is 16, and $\alpha$ is 2 since it is a binary sequence.
So according to Lempel and Ziv, the following should be true:
$$ C(S) < \frac{16}{(1-\varepsilon_{n})\;log_2(16)} $$
and
$$ \varepsilon_{n} = 2 \frac{1 + log_2\;log_2(32)}{log_2(16)} $$
But when I resolve this, I get that the upper bound is equal to -6.0517. The value not only is lower than the complexity value, but is also negative. What am I doing wrong?
Even though it's not explicitly stated, regardless of the LZ application, the function $$ \varepsilon_{n} = 2 \frac{1 + \log_2\;\log_2(2n)}{\log_2(n)} $$ which is monotone decreasing for $n\geq 1,$ does not go below $1$ until $n$ is large. So it seems the inequality $\varepsilon_n<1$ won't apply before this, yielding a negative upper bound.
I suggest checking Lempel and Ziv's other papers to see if this function is used again, and if its use is clearer, stated for large $n.$