Given a binary sequence, how can I calculate the quality of the randomness?
Following the discovery that Humans cannot consciously generate random numbers sequences, I came across an interesting reference:
- "A professor of probability would have half of his class write down a list of zeros and ones from a bunch of coin tosses and the other half write down a list of zeros and ones that they would try to make look random. The professor could sort out the papers with only a glance at each paper."
What method would be suitable to replicate the professor's act, i.e. judging which sequence is likely to be generated by a random process?
A method I have in mind is: for a given sequence-length $ n $, establish the frequencies of each possible sub-sequence in the population of all possible sequences and than compare these with a particular sequence.
As this very quickly becomes impractical (the number of sub-sequences grows exponentially with $ n $) the method may, instead, only measure a subset of all sub-sequences, e.g. all sequences of same digit.
How good would such a method work? What are some better methods?
Although, I do think it might be more fitting to ask somewhere else, I will try to answer this from a mathematical perspective.
As you noted, counting subsequences can get impractical, and doesn't, in a reasonable pratical environment, give ideal results, instead a varity of other techniques can be used. These are known as randomness tests.
Your example is one of many randomness tests. Other well-known tests are (I take the liberty to reframe your question in terms of sequence, $ f : \mathbb{N} \to \mathbb{Z}_n $ for some $ n $),
The Wald–Wolfowitz runs test. Extract uniformly distributed reals on the interval $ (0, 1) $. Count ascending and descending runs. We know that the mean ought to be $ \frac{2 N_{+} N_{-}}{N} + 1 $. It follows that the variance will ideally be $ \sigma^2 = \frac{(\mu - 1)(\mu - 2)}{N - 1} $ (with $ N_+ $ being number of ascending runs and $ N_- $ being the number of descending runs, and $ N = N_+ + N_- $).
The overlapping sum test. Generate $ n $ reals on $ (0, 1) $, add sequences of $ m < n $ consecutive entries. The sums should then be normally distributed with characteristic mean and variance.
Matrix rank test. Through the random sequence, form a matrix over $ \mathbb{Z}_n $, then determine the rank of the matrix, and examine the distribution.
To make it even more effective, one can add a number of classes of bijective combining functions, $ g_m: \mathbb{Z}_n \to \mathbb{Z}_n $. One can then construct a new sequence by $ f'(n) = g_n(f(n)) $ and perform the randomness tests on $ f' $. Examples of such function classes include,
Adding sequent elements, $ g_n(x) = x + g_{n - 1}(f(n - 1)) $, with $ g_0(x) = x $.
Constant addition, $ g_n(x) = x + c $ for some $ c $.
Multiplying by some $ p $ relatively prime with the modulo, $ g_n(x) = px $.
and so on.
This is used as a tool for extracting patterns (e.g. weakening the function by identifying and abusing its patterns). Since a truely random sequence is pattern-less, this can be used for "measuring" randomness.