Measuring degrees of randomness

2.9k Views Asked by At

Imagine, for simplicity's sake, that we have a set of numbers, each equal to either 0 or 1. Let's call each a bit. Rationally, if the set is completely random, and reasonably large, the probability should be quite high that each possible value (0 or 1) is equally represented, and this probability should grow with increasing size of the set, until, with infinite bits in the set, the chance each possible value is equally represented should be exactly 1.

However, with my current knowledge of set theory and statistics, I cannot think of a way to measure this probability, given a set size and number of contained values. Also, assuming we have found this function, it implicitly defines the set as perfectly random. If that were not so, how could one factor in "degree of randomness?" How is degree of randomness measured?

2

There are 2 best solutions below

0
On

You are interested in the proportion of $1$s in the set, that is $n/N$ where $N$ is the size of the set, and $n$ is the random number of $1$s.

Now $n$ is sum of independant Bernoulli variables $x_i$ with parameter $(p=1/2)$ and you are interested in the deviations from the mean $1/2$.

From the CLT the size of the deviations is $\sqrt{N}$ for the number of $1$s, that is $\frac 1{\sqrt{N}}$ for the ratio:

$$ P\left(\left|\frac nN - \frac 12\right|>\frac a{\sqrt{N}}\right)= P\left( \sqrt{N}\left|\frac 1N \sum x_i - Ex_1 \right|>a\right)\to \int_{-a}^a \exp{-\frac{u^2}{2V}}\frac{du}{\sqrt{2\pi V}} $$where $V=p(1-p)=1/4$ is the variance of the $x$ variable.

1
On

The question came up while the first answer is perfect

I'll add a complement to the answer of this question : How is degree of randomness measured?

Instead of computing the variance of all the datas, that is in some cases intrinsically difficult, one can pick random samples, compute their variances and the variance of the sampled variances. In some cases when speed is critical, simply counting the number of bits up in small samples may give accurate informations on the quality of the whole set of the random datas.

This method may be used dynamically to build a confidence(time) curve that may be used in the real time ( or not ) analyses.