Given all permutations of N-digits, what is the ratio of random-seeming vs non-random-seeming sequences?

44 Views Asked by At

My question is inspired by the comment in this article that the 20 digit sequence 03729563829603547134 seems "more random" than 99999999999999999999. One could use some measure of compressibility to quantify the "seeming randomness" of a given N-digit sequence: some threshold of compressibility could be set to distinguish "random-seeming" (RS) from "non-random-seeming" (NRS) sequences.

In such a situation, would there be a way to calculate the ratio of NRS sequences to RS sequences? Intuition suggests such a ratio would be small; but how small? To make it even more concrete, suppose we set the compressibility threshold at 80%. How can one calculate the ratio of >=80%-compressible 20-digit sequences to <80%-compressible 20-digit sequences? Does the ratio decrease with sequences of increasing length?

1

There are 1 best solutions below

4
On

The simplest method to get an answer would be to use the pigeonhole principle.

Let's assume we are working in base $b$, are looking at sequences of length $n$ and are considering a sequence to be 'well compressed' if it has length at most $m$.

Then we can see that the following is the number of sequence of length at most $m$, $K_m$

$$K_m=\sum_{i=0}^{m}b^i=\frac{b^{m+1}-1}{b-1}$$

Now note that there are $b^{n}$ sequences of length $n$.

As each 'non-random' sequence must be given a compressed form of length at most $m$, we see that the ratio of 'non-random' sequences to all sequences $r_{n,m}$ is at most

$$r_{n,m}=\frac{K_m}{b^n}=\frac{b^{m+1}-1}{b^{n+1}-b^n}$$

The ratio of 'non-random' to 'random' is then,

$$R_{n,m}=\frac{r_{n,m}}{1-r_{n,m}}=\frac{b^{m+1}-1}{b^{n+1}-b^n-b^{m+1}+1}$$

To get any better value than this would require you to choose a particular method of compression or add extra constraints.

The particular case you mention gives a ratio of at most $1.1111\times10^{-16}$.