Why does randomness exhibit a pattern in the long run?

4.7k Views Asked by At

!!! Layman here so please avoid complex math and answers.

Random (usually pseudorandom) events are usually characterized along these lines:

  1. Each outcome in a trial experiment must be i.i.d.; i.e. it has no effect on subsequent outcomes, thus individual outcomes cannot be predicted using past data as there is no obvious causal link
  2. Large sequences of outcomes are predictable, because they exhibit a pattern of stabilizing relative frequencies, such that no individual outcome is "preferred" and dominates the rest

The prevailing thought in probability theory (frequentism) is that stabilizing relative frequencies are an objective phenomenon, independent of human thought. This assumption has served statisticians, casinos and insurance companies well. What this basically implies is that large sequences of similar random events are consistent and their averages can be confidently predicted within a "sufficiently" large sample.

Why can we predict the averages of big samples of individually unpredictable random events?

3

There are 3 best solutions below

7
On

Note that as empirical scientists, physicists don't really need any more than the ability to make testable predictions to justify a theory.

According to our current understanding, the most basic laws of nature (quantum theories) are probabilistic. As far as we know, that's just how the world works, and if there's a deeper reason behind it, we haven't been able to find it.

The prime example for a probabilistic system due to quantum effects would be radioactive decay.

At macroscopic levels, there are several ways for probabilities to emerge, and here are two ways I thought of:

First, we can have a family of interacting periodic systems. If you look at them only for a short time, they appear random; but if you waited for the least common multiple of the periods (or any time 'long enough'), they'll behave perfectly regularly.

A second example would be chaotic systems, which are effectively unpredictable - but this does not mean that you're equally likely to find the system in any particular region of phase space.

Now, what happens when you have a large number of such systems? Then, we have results like the law of large numbers or the central limit theorem. From a practical point of view, in many cases, you just end up either with random noise or a bell curve.

Now, let's look at a particular example: Tossing a coin.

Let's assume we're operating under ideal conditions, and the result of any toss only depends on two parameters: The coin's spin and its velocity.

The result will be periodic in both initial conditions, and if you overlay a source of noise, it's easy to see how you can end up with 50/50 probabilities if the parameters are right. The randomness of the coin toss got pushed back to the randomness of the noise.

8
On

The single determining characteristic that is required for the emergence of the Law of large Numbers is that the various random events are independently random (or at least sufficiently so).

If I had a coin that I flip once, and then observe repeatedly, then those observations won't be independent. They'll be random, for sure: I cannot predict the result of the 2nd observation up front, but I can predict the result of all future observations after I observed the coin once. The LLN does not hold here.

But if I flip the coin as little as once every 1000 observations, then those observations are already sufficiently independent for the LLN to kick in. After all, the first observation now predicts only a minuscule fraction of the next million observations.

0
On

Toss a fair coin. For a single drawing the expected proportion of heads is distributed as follows (for $0$ and $1$):

$$\frac12, \frac12$$

For ten drawings in a row, by simple probability computation (from $0$ to $10$):

$$\frac{1}{1024}, \frac{10}{1024}, \frac{45}{1024}, \frac{120}{1024}, \frac{210}{1024}, \frac{252}{1024}, \frac{210}{1024}, \frac{120}{1024}, \frac{45}{1024}, \frac{10}{1024}, \frac{1}{1024}$$

The probability of very skewed drawings, like 10 heads in a row, goes decreasing with the length of the sequence, and the more "compliant" distributions are much more likely.

So completely deviating means are still possible, but with a smaller and smaller probability. The probability distribution always tend to concentrate around the mean. (The standard deviation is divided by the square root of the sequence length.)