Expected time until a repeat in a sequence of infinitely many coin flips?

274 Views Asked by At

Consider flipping infinitely many coins and recording the result in a string $x_0 x_1 x_2 \dots$ Let $s_t$ be the substring of length $n$ starting from position $t$.

Let $\tau_n = \min(t > 0 : s_0 = s_t)$ be the random variable calculating the first time from when the initial substring $s_0$ is repeated. What is $\Bbb{E}(\tau_n)$?

It is easy to show $\Bbb{E}(\tau_1) = 2$: in the case that $n = 1$, what we are calculating is simply the average minimum $t > 0$ such that $x_t = x_0$, and we can take advantage of the independence of the coin flips.

The probability that $x_t = x_0$ and $x_i \ne x_0 \ \ \forall \ 0 < i < t$ is $\left(\frac12\right) \cdot \left( \frac12 \right)^{t-1}$, then $$\Bbb{E}(\tau_1) = \sum_{n=1}^\infty n \cdot \Pr(\tau_1 = n) = \sum_{n=1}^\infty n \cdot \left(\frac12\right) \cdot \left( \frac12 \right)^{n-1} = 2$$

Experimental data suggests $\Bbb{E}(\tau_n) = 2^n$. I considered an inductive argument: waiting for a repeat of $n+1$ flips necessarily requires a repeat of $n$ flips, and then additional wait time until the last, single $n+1$-th character occurs - a wait time equal in distribution(?) to $\tau_1$, in which case $\Bbb{E}(\tau_{n+1}) = \Bbb{E}(\tau_n) \cdot \Bbb{E}(\tau_1)$?

I don't know how to make this rigorous though.

4

There are 4 best solutions below

5
On

Noting out front that this does not give a solution.

I think a direct calculation might be easier than an induction. After trying a couple of things, I ended up finding it easiest to use a trick sometimes called the bathtub formula, or sometimes the Darth Vader formula (ugh haha...), which states that \[X \geq_{\text{a.s.}} 0 \implies \mathbb{E}(X) = \int P(X>x) \, dx.\]

Using this formula, \begin{align*} \mathbb{E}(\tau_n) &= \sum_{t=0}^\infty P(\tau_n > t)\\ &= 1 + \sum_{t=1}^\infty P(\tau_n > t), \end{align*} where the 1 in the last line appears since surely $\tau_n>0$.

So, let's find \begin{align*} \qquad\qquad\qquad % for centering, pls ignore. P(\tau_n > t) &=\prod_{k=1}^t P(s_t \neq s_0) \qquad\qquad\qquad (*)\\ &=\prod_{k=1}^t 1 - P(s_t=s_0)\\ &=\prod_{k=1}^t 1 - 2^{-n}\\ &=(1-2^{-n})^t. \end{align*}

Plugging back in, $\mathbb{E}(\tau_n)=1+\sum_{t=1}^\infty (1 - 2^{-n})^t=1+2^{n}-1=2^n$ which agrees with your experiments.

As mentioned in a comment, the line $(*)$ needs justification. Clearly $\{t_n > t\}=\bigcap_{k=1}^t \{ s_k \neq s_0 \}$, and we claim that these events are independent (which is surprising). Indeed, consider the joint distribution of just two of them. This can be broken down into a product of exclusive-or events $x_i\neq x_j$, which are independent: \begin{align*} P(s_1\neq s_0,s_2\neq s_0) &= P((x_1\neq x_0\cup x_2\neq x_1 \cup\dots\cup x_n\neq x_{n-1})\cap(x_2\neq x_0\cup x_3\neq x_1\cup\dots\cup x_{n+1}\neq x_{n-1})). \end{align*} I claim that for any $i\neq j$ and any $p,q$, the binary random variables $y_{i,p}=x_i\neq x_p$ and $y_{j,q}=x_j\neq x_q$ are independent. This is clearly true if $p\neq q$. If $p=q$ this is follows because the xor operation $\neq$ preserves the independence of $x_i$ and $x_j$, because if $x_p=1$ it just flips the bits and if $x_p=0$ it does nothing. By exchanging the roles of $i,j$ and $p,q$ we see that $y_{i,p}$ and $y_{j,q}$ are also independent for any $i,j$ as long as $p\neq q$.

Well, this is as far as I've gotten with this method. To finish it off, we would need joint independence of the $y_{i,p}$ random variables, since $\sum_i y_{t+n+i,i}>0$ happens when $\{ s_t \neq s_0 \}$, and since these sums are independent under joint independence of $y_{i,p}$. Judging from comments on this answer, this is probably not going to work.

0
On

Here are some thoughts which are too long for a comment, but do not yet give a full solution.

Let us consider the particular example of calculating $E(\tau_5 \mid x_\cdot=01001\ldots)$. As an auxiliary variable, let us set $Y$ to be the random variable which gives the starting index of the first occurrence of 01001. Then we have: \begin{align*} E(Y \mid x_\cdot = 0000\ldots) & = 1 + \frac{1}{2} E(Y \mid x_\cdot = 0000\ldots) + \frac{1}{2} E(Y \mid x_\cdot = 0001\ldots); \\ E(Y \mid x_\cdot = 0001\ldots) & = 1 + \frac{1}{2} E(Y \mid x_\cdot = 0010\ldots) + \frac{1}{2} E(Y \mid x_\cdot = 0011\ldots); \end{align*} and so on, with the exceptional equation being: \begin{align*} E(Y \mid x_\cdot = 0100\ldots) & = \frac{1}{2} + \frac{1}{2} E(Y \mid x_\cdot = 1000\ldots). \end{align*} This gives a system of 16 equations in 16 unknowns $E(Y \mid x_\cdot = ????\ldots)$. Putting these equations into Octave and solving, I get values $(33, 31, 25, 35, 17, 31, 33, 35, 33, 31, 25, 35, 33, 31, 33, 35)$ for the unknowns.

In particular, we now have: $$E(\tau_5 \mid x_\cdot = 01001\ldots) = 1 + E(Y \mid x_\cdot = 1001\ldots) = 1 + 31 = 32.$$


In my experimentation with different cases of this computation, I consistently seem to get $E(\tau_n \mid x_\cdot = A \ldots) = 2^n$ for any $n$-sequence $A$, which would be a stronger statement than the question's hypothesis $E(\tau_n) = 2^n$. (Though do note of course that the overall distribution of $\tau_n$ is not independent of the first $n$ digits of $x$.)


It does seem very possible that if you instead consider e.g. 32 unknowns $E(Y \mid x_\cdot = 00000\ldots)$ then it might end up being simpler to reason about that system of equations, since then the only exceptional equation would be $E(Y \mid x_\cdot = 01001\ldots) = 0$.

5
On

The probability for $n$ bits to form a particular string of length $n$ is $2^{-n}$. Thus, if we consider a string of $L$ bits, we expect $(L-n+1)2^{-n}$ occurrences of any particular string of length $n$. Taking $L$ to infinity, we find that the expected number of bits from one occurrence of any particular string of length $n$ to the next must be $2^n$. Thus, no matter what initial string of length $n$ the sequence starts out with, the expected number of flips until that same string occurs again is $2^n$.

0
On

OK, I think the following is a fully rigorous proof, and it uses a (somewhat strong, but "intuitively obvious") theorem about Markov Chains.

The state of the system can be characterized by the last $n$ flips, i.e. the system is a Markov Chain with $2^n$ states. Any state can transition to exactly two states, depending on the newest flip, e.g. the $n=5$ case has $32$ states, and state $\color{red}{1}0011$ can transition to $0011\color{blue}{0}$ or $0011\color{blue}{1}$ where the blue digit represents the newest flip (and the red $\color{red}{1}$ is the oldest flip that has "aged out").

As usual let $\tau_{xy}$ denote the transition time starting from state $x$ and going to state $y$, and $\tau_{xx}$ is the revisit time. As several people have pointed out, for different states $x \neq y$, in general $\tau_{xx}$ and $\tau_{yy}$ are not equi-distributed random variables. E.g. for $x = 11111, Prob(\tau_{xx} = 1) = 1/2$ (happens if the next flip is $1$), whereas for $y=10100, Prob(\tau_{yy} = 1) = 0$. So it does come as a (mild?) surprise, at least to me, that despite the revisit times have all kinds of different distributions (related to a sort of minimal FSM to detect this pattern), they have the same expected value:

Claim: For any state $x, E[\tau_{xx}] = 2^{n}$.

This follows almost directly from this (somewhat strong, but "intuitively obvious") theorem:

Theorem: for a positive recurrent Markov Chain, with stationary distribution $\pi, E[\tau_{xx}] = 1 / \pi_x$ for any state $x$.

E.g. see Prop 2.6 of these notes or Theorem 1.19 of these notes or (for a slightly different version) wikipedia

The Markov Chain with the $2^n$ states is clearly recurrent, and any finite recurrent chain is positive recurrent, so the Theorem can be applied. (Indeed, the chain is also aperiodic, and ergodic, and all the good stuff.) Anyway, it is also clear the only stationary distribution is the uniform one where every $\pi_x = 2^{-n}$.

The answer by @joriki implicitly assumed something like the Theorem above, so my answer can be considered a justification of the fact that, at least for positive recurrent Markov Chains (e.g. this problem), the Theorem indeed holds.