Formal statement of property of randomness of a sequence

181 Views Asked by At

Suppose we have a probability space $(\Omega,{\mathscr F},P)$ consisting of

  • An arbitrary nonempty set $\Omega$

  • A collection ${\mathscr F}$ of subsets of $\Omega$ which is also a $\sigma$-algebra on subsets of $\Omega$

  • A probability measure $P: {\mathscr F} \rightarrow [0,1]$

I'm reading a text which shows how to generate a sequence of numbers $X_1,X_2,\ldots,X_n$ such that the distribution of the sequence is in $U(0,1)$ and "random" in some sense. That's two properties:

  1. Lies in distribution. This can be established by a "convergence in distribution" test $\lim_{n\rightarrow\infty}F_{X_n}(x) = F_X(x)$ where $F_{X_n}$ is the empirical CDF of the sequence and $F_X$ is the theoretical CDF of the probability space.

  2. Sequence is "random". There is a large literature on the notion of randomness and a very small and ad hoc literature on practical testing of sequences for randomness. Maybe what I am looking for is Martin-Löf randomness.

One version of the Martin-Löf randomness definition is that "A sequence is Martin-Löf random if and only if no constructive martingale succeeds on it."

Q1. What is considered the "best practice" to formally state property 2? See for example this paper.

Q2. The Martin-Löf definition in Wikipedia is independent of the probability space. Is this correct? In general, is the randomness of a sequence independent of the probability distribution it converges to?

One possible answer, adapted from Wikipedia. It's not as constructive as I'd like, and it doesn't reference a particular probability distribution, which seems also undesirable:

  • Let $S = (X_1,\ldots, X_n, Y_1, \ldots, Y_m)$ be a sequence comprised of sequence $X_i$ followed by sequence $Y_j$. Let us define a martingale $d:\Omega^\ast \rightarrow [0,\infty)$ such that for all sequences $S$, $d(S) = \frac{1}{2} (d(X) + d(Y))$. A martingale is said to succeed on $S$ if $\lim_{n\rightarrow\infty} d(S_1,\ldots,S_n) = \infty$. A martingale is said to be constructive if there exists a computable function ${\hat d}: \Omega^\ast \times {\mathbb N} \rightarrow {\mathbb Q}$ such that, for all $S$ for all $t>0$, ${\hat d}(S,t) \leq {\hat d}(S,t+1) < d(S)$. A sequence is Martin-Löf random if and only if no constructive martingale succeeds on it.

That is, the above is not constructive in the sense that it doesn't supply an explicit test for randomness of a particular realized sequence with respect to a particular probability distribution. The Wikipedia page on tests of randomness lists a number of fairly ad hoc criteria, nothing that looks mathematically rigorous.

Also I just found an equivalent question here.

Another related criterion is whether or not the sequence is incompressible. This test however would reject a single sequence which happens to be compressible. The implication is that tests of randomness would need to work on multiple samples of a random sequence generator to arrive at a conclusion, not a single output. So in that view, if the outputs are highly incompressible on average, then the random sequence generator is good (one can imagine a distribution of compressibility where a certain shape of the realized distribution corresponds to "random"). This program uses compressibility tests to assess the randomness of a sequence.

In the end what I'm concerned about in Q2 is tests of randomization. The most satisfying test that occurred to me is to generate many sequences from the RNG, then plot the empirical PDF of the compression ratio of the sequences, i.e. take a good compression algorithm and divide original sequence size over compressed size. If that clusters around 1, then you have good randomization.

Q1, tests of distribution fit, has many accepted tests. Just Q2 is less talked about.

2

There are 2 best solutions below

5
On

There are different uses of the word "random". My linked answer to another question, linked below, is about some of them. If the book you mentioned is describing an algorithm for generating numbers--it sounds like it is--that's a pseudorandom number generating algorithm (PRNG), and the generated sequence cannot be Martin-Löf random, because M-L randomness implies that there is no way to generate the sequence that is shorter than the sequence itself. PRNGs are by definition very succinct ways to generate sequences.

This is a similar question, and my answer provides some literature references on Martin-Löf randomness and philosophical discussions of randomness. The Volchan paper you mentioned is another source; I have seen it but have not read it. (I don't consider your question a duplicate, since your question is much more detailed.)

One criterion for randomness of a finite sequence, which is extremely closely related to Martin-Löf randomness (I'm being vague to avoid making subtle mistakes), due to Chaitin and Kolmogorov, is roughly that a finite sequence is random if it can't be generated by an algorithm that's shorter than the sequence itself (e.g. using a Turing machine). You can test for randomness by running through all possible programs that are shorter than the sequence. (Not very efficient!) This won't work for infinite sequences, so for that you need Martin-Löf's definition. (This is point is something I am studying currently, so I am repeating what I've read rather than drawing upon my own understanding.)

For information on testing of PRNGs, good starting points include the following:

A visual test like the one you described can be useful, but it's just a starting point. Likewise for ent. I would recommend consulting sources like those above rather than trying to come up with your own tests. L'Ecuyer's site includes the TestU01 suite of PRNG tests, which incorporates many PRNG tests (including tests like those described by Knuth). L'Ecuyer's paper with Simard describing TestU01 is very useful. There are a few other test suites that may be worth using as well.

Additional details, partly based on my comments:

PRNGs are kind-of sort-of good-enough-for-a-purpose random, and they are tested with statistical tests of the same sort that are used in frequentist statistical methods in science. No formal definition is possible.

A PRNG is good if it passes a large number of well-known statistical tests for independence between combinations of r.v.'s. That is, it's good if, according to statistical tests, the sequence of numbers looks like a sequence that would be generated by independent r.v.'s. In this sense, testing a PRNG is like testing some real random process realized in the world, except that in the case of a PRNG, there are not distinct r.v.'s.; there is just the one algorithm.

Roughly, a sequence is ML random if it passes all possible statistical tests for the outputs being from independent combinations of r.v.s. Except that in the algorithmic randomness/complexity world, there are no r.v.s, and there is no generator; there is only the sequence.

That is, although M-L randomness is based on something like PRNG testing on steroids (or PRNG testing is the poor cousin of how M-L randomness is defined), there is a difference in what is actually being tested--a sequence itself, in one case, and a sequence-generator in the other.

Finally, about $U(0,1)$ and probability spaces: Most PRNGs are designed to emulate independent trials of $U(0,1)$ r.v.'s. This makes testing easier, and it's an obvious choice. There are methods for transforming the output so that it will emulate other distributions, e.g. in this book. Some books or articles define M-L randomness only for uniform distributions on binary sequences (but this has implications for integers in other bases or for rationals and reals). Sometimes M-L randomness is defined directly for uniform distributions on numbers in arbitrary bases (see the Calude book). Sometimes M-L randomness is defined for arbitrary probability distributions--not just uniform distributions--on binary numbers (as in Li and Vitanyi--but most of the examples use uniform distributions).

I would say that M-L randomness is independent of a probability space in one sense, in that there's no need to characterize trials in terms of a probability space. There is a probability space, though. It's just that it's usually implicit. The algebra is a product algebra over a finite alphabet (usually $\{0,1\}$), and the probability distribution is usually a uniform distribution. With infinite sequences, my understanding is that this can be understood in terms of a Cantor space and Lebesgue measure (but I am still learning about this, and I may have misstated the point). But look at Li and Vitanyi's definition of a Martin-Löf test in their chapter 2 (at least in the 3rd or 4th ed); they represent the probability distribution over binary sequences explicitly.

(The "equivalent" question to which you linked has answers and discussion that are somewhat relevant. However, most of the answers are focused on unit testing, which I don't see as entirely relevant to your questions. You can decide.)

0
On

Stating my intuition here.

Let $(\Omega,{\mathscr F},P)$ be a probability space.

Let $X^1,\ldots,X^m$ be a set of $m$ random sequences of length $n$ in $P$, so that $X^i=(X_1^i,\ldots,X_n^i)$.

Let ${\mathscr F}^\ast$ be a compressed representation of events in ${\mathscr F}$. Let invertible $C:{\mathscr F}\rightarrow {\mathscr F}^\ast$ be a compression function. Let $R:{\mathscr F}^\ast \in [0,1]$ be a compression ratio function which gives the ratio of the compressed size to the uncompressed size.

Let ${\textrm epdf}(S)$ be the normalized histogram or empirical PDF of elements of set $S$.

Then $X$ is pseudorandom in $P$ if

  1. ${\textrm epdf} (\cup_{i=1}^m \cup_{j=1}^n X_j^i)$ converges in distribution to $P$

  2. The ${\textrm epdf}(\{R(x): x \in X\})$ clusters around 1.

To be concrete regarding point 2, let's do an experiment on the compression ratio for 1000 sequences of size 50 Python 3 $U(0,1)$ random numbers:

%matplotlib inline
from matplotlib.pylab import *
import scipy.stats as st
import zlib, json

def C(X):
    Xstar=bytes(json.dumps(X.tolist()), 'UTF-8')
    return (Xstar, zlib.compress(Xstar))

def R(CX):
    return len(CX[1])/len(CX[0]) 

D=st.norm(0,1)
(m,n)=(1000, 50)
X=D.rvs((m,n))
RX=[R(C(x)) for x in X ]
hist(RX, normed=True,bins=50);

The picture we get is:

enter image description here

This looks like a Skellam distribution for $k=0$ with a mean around 0.495. I would have expected it to be skewed to the right with a center closer to 1. Either way the picture is evocative and interesting.