Yet another question on Expected number of flips to get 2 consecutive heads

483 Views Asked by At

So, there are quite a few solutions on the web to the problem of "what is the expected number of flips to get 2 consecutive heads for a fair coin". Many of theses solutions use the conditional expectation formula $E(X) = \sum E(X|B_{i})P(B_{i})$ where $X$ is a random variable, and $B_{i}$ is the set of events partitioning the sample space, and $P$ is the corresponding probability.

In our situation, the number of flips is usually given by the solution of $$E(X) = \frac{1}{2} + \frac{1}{4}(E(X)+2) + \frac{1}{2}(E(X)+1).$$ The usual given logic is that if one gets a tail on the first flip, then expected number is $E(X)+1$, if one gets two tails on first two flips, then it is $E(x)+2$ and if one obtains two heads straight away, then $E(X)=2$.

No matter how much I read and think about the given solution, I am no closer to understanding the principle behind it. I don't the following two things:

1) Underlying assumption that one can simply treat $E(X)$ as a random variable, when it isn't by definition.

2)The set up of the solution itself. We don't know apriori $E(X)$ and we don't have any assumptions on the value it may take - so why, for example, if get a tail on the first flip, the expectation becomes $E(X)+1$? Why doesn't it simply stay $E(X)$? Surely no matter whether we get a tail first or not, $E(X)$ stays the same?

1

There are 1 best solutions below

0
On

The outlined solution is a standard technique in the theory of Markov chains, and to fully appreciate it with mathematical rigor, we need a deeper look.

The main idea is that $X$ is completely determined by knowing the entire history of coin flips. More precisely, let $f$ be a function whose input is an entire history of coin tosses $y = (y_n)_{n\geq 1} \in \{H, T\}^{\mathbb{N}}$ and its outcome is the first moment where the two $H$'s have appeared in a row. This can be conveniently written as

$$ f(y) = f(y_1, y_2, \cdots) = \min\{ n \geq 2 : (y_{n-1}, y_n) = (H, H)\} $$

with the convention that $\min \varnothing = \infty$. Up to this point, no probability theory is involved; we simply defined a function $f$ which reads out some value out of its input. Mentally you may consider $f$ as a machine such that, when it is fed with an infinite string of $H$'s and $T$'s, it detects the first position of the pattern $HH$ in the string.

Now let us feed $f$ with random coin tosses. More precisely, let $Y = (Y_n)_{n\geq 1}$ be a sequence of i.i.d. RVs with $\mathbb{P}(X_n = H) = \mathbb{P}(X_n = T) = 1/2$, representing an infinitely long record of fair coin flips. Then we may realize $X$ as

$$ X = f(Y). $$

So far, it seems that we have concocted a very complicated and indirect way of demonstrating $X$. But this formulation will be helpful for understanding what is going on in the outlined solution in OP.

The outlined solution goes by decomposing $\mathbb{E}[X]$ according to some initial outcomes and examining resulting terms separately:

\begin{align*} \mathbb{E}[X] &= \mathbb{E}[X \,|\, Y_1 = T] \mathbb{P}(Y_1 = T) \\ &\quad + \mathbb{E}[X \,|\, (Y_1,Y_2) = (H,T)] \mathbb{P}((Y_1,Y_2) = (H,T)) \\ &\quad + \mathbb{E}[X \,|\, (Y_1,Y_2) = (H,H)] \mathbb{P}((Y_1,Y_2) = (H,H)). \end{align*}

Let us focus on the first term. Let $y = (y_n)_{n\geq 1} \in \{H, T\}^{\mathbb{N}}$ a sequence satisfying $y_1 = T$. Then

\begin{align*} f(y) = f(y_1, y_2, y_3, \cdots) = f(T, y_2, y_3, \cdots) = 1 + f(y_2, y_3, \cdots). \end{align*}

The last step holds because you have to rebuild the pattern $HH$ as soon as you encounter $T$. Now given $\{Y_1 = T\}$, plugging $y = Y$ gives

\begin{align*} \mathbb{E}[X \,|\, Y_1 = T] &= \mathbb{E}[f(T, Y_2, Y_3, \cdots) \,|\, Y_1 = T] \\ &= \mathbb{E}[1 + f(Y_2, Y_3, \cdots) \,|\, Y_1 = T] \\ &= \mathbb{E}[1 + f(Y_2, Y_3, \cdots)] \\ &= 1 + \mathbb{E}[f(Y_2, Y_3, \cdots)] \\ &= 1 + \mathbb{E}[X]. \end{align*}

There are two steps that deserve explanation. In the third step, we dropped out conditioning because $Y_1$ and all the rest are independent. Intuitively, this is because knowing the first coin flip $Y_1$ should never affect anything about the rest of coin tosses $Y_2, Y_3, \cdots$. Next one is the final step, and it is the crux of this argument. This follows because $X = f(Y_1, Y_2, \cdots)$ and $f(Y_2, Y_3, \cdots)$ have the same distribution, even though they are not the same random variable.

For the other terms, the argument goes likewise:

\begin{align*} \mathbb{E}[X \,|\, (Y_1, Y_2) = (H, T)] &= \mathbb{E}[f(H, T, Y_3, Y_4, \cdots) \,|\, (Y_1, Y_2) = (H, T)] \\ &= \mathbb{E}[2 + f(Y_3, Y_4, \cdots) \,|\, (Y_1, Y_2) = (H, T)] \\ &= \mathbb{E}[2 + f(Y_3, Y_4, \cdots)] \\ &= 2 + \mathbb{E}[X], \end{align*}

and

\begin{align*} \mathbb{E}[X \,|\, (Y_1, Y_2) = (H, H)] &= \mathbb{E}[f(H, H, Y_3, Y_4, \cdots) \,|\, (Y_1, Y_2) = (H, H)] \\ &= \mathbb{E}[2 \,|\, (Y_1, Y_2) = (H, H)] \\ &= 2. \end{align*}

Combining altogether, we recover the equation

$$ \mathbb{E}[X] = \frac{1}{2}(\mathbb{E}[X] + 1) + \frac{1}{4}(\mathbb{E}[X] + 2) + \frac{1}{4}(2) $$

as in OP.