Why is more likely for a run of three 0s to be followed by a 1 than a 0 in a 50/50 random binary string?

263 Views Asked by At

Take a random 12 digit random binary string, each bit equiprobable 0 or 1. Select a bit that is preceded by 3 0s equiprobably at random. The probability that the bit is 1 is ~66%. Why? Why is this probability 50% if we calculate the sample average of all bits preceded by 3 0s instead of the sample average of a randomly selected bit that is preceded by 3 0s?

import numpy as np
import random

ntrials = 10000
n = 12
q = []
z = []

for _ in range(0, ntrials):
    r = np.random.randint(0,2,n)
    x = []
    for i in range(3, n):
        if np.all(r[i-3:i] == 0):
            x.append(r[i])
    if x:
        q.append(random.sample(x,1)[0])
        z.extend(x)

if q:
    print(np.mean(q)) # Why are these different?
    print(np.mean(z)) 
else:
    print(0)
3

There are 3 best solutions below

2
On BEST ANSWER

It is perhaps easier to enumerate $5$ bit cases

Strings with three $0$s followed by something are

00000, 00001, 00010, 00011, 10000, 10001

where the possible bits following a 000 are respectively

0 & 0, 0 & 1,     1,     1,     0,     1    

So you have a choice in setting up the two methods, affecting the probabilities:

  • take the simple average over each possible bit of $0,0,0,1,1,1,0,1$ to get $0.5$
  • take the average of the averages for each possible string of $0, 0.5,1,1,0,1$ to get about $0.5833$; this corresponds to conditioning on a string containing 000 and then choosing at random a eligible bit from that string

Why is the average of averages higher with $5$ or $12$ bits or other lengths greater than $4$? Because $1$ is more like to appear more often in strings alone or with few alternatives, while $0$ can more often appear multiple times in stings with more than three consecutive $0$s and averaging averages under weights these cases

0
On

Let's consider the case where you select the value following one 0 (same problem easier for my example). Let us consider a length 3 binary string. The equally likely string with at least one x and their corresponding output are the following:

$101-1\\ 011-1\\ 100 -0\\ 010-1\\ 001-\{0,1\}\\ 000-\{0,0\}$

$z = \{1,1,0,1,0,1,0,0\}$ which gives 0.5

The problem with random.sample(x,1)[0] is that the element $\{0,0\}$ is actually replaced with 0 in x. So that you underestimate the number of 1's. the element $\{1,0\}$ works fine because it is balanced.

This problem is very specific to the number of consecutive zero you look at and the length of the string. In the case where the number of 0s and the length of the string change you could have cases where the unbalanced sets offset each other (especially for n >> number of 0s).

3
On

I didn't look at your program, and what it tried to do. In any case your $60\%$ claim made in the text is simply wrong since it contradicts the assumed independence of the digits. (This dream of yours has been the ruin of thousands of gamblers $\ldots$)

If you are not convinced: The following little program looks at all binary strings of length $12$ and counts how often $111$ is followed by a $0$, resp., by a $1$. The resulting numbers are equal.

enter image description here