Computing the average number of times that n non-overlapping heads occur

80 Views Asked by At

Considering a sequence of $100$ coin flips, how can I compute $z(n)$, the average number of times that $n$ non-overlapping heads occur? (For example, six consecutive heads count as exactly two occurrences of three consecutive heads.)

If I understand well:

For example: "THHHTTTHHHHTHTH..."

Breaking down this example, we have $2$ occurrences of $3$ consecutive heads: "HHH", "HHH".

The probability of getting $6$ consecutive heads is $(1/2)^6 = 1/64$. But if this counts as $2$ occurrences of $3$ consecutive heads, then the probability to $z(3)$ is $2 * (1/64) = 1/32$.

2

There are 2 best solutions below

0
On

My initial analysis, which I left as a comment, computed the expectation as $~\displaystyle \frac{98}{8}.~$

I was basing this on Linearity of Expectation, which includes a proof that the formula applies even when the events are not independent of each other.

My idea was that from $~100~$ coin flips, there are $~98~$ opportunities for 3 consecutive heads, and each opportunity has a $~1/8~$ probability of occurring.

However, as indicated by Henry's comment, following the posting, I overlooked that the OP (i.e. original poster) was not enumerating overlaps as I was. That is, both 4 consecutive heads and 5 consecutive heads are to be counted once only, while 6 consecutive heads is to count as 2 occurrences of 3 consecutive heads.

So, I deleted my original comment.


However, I just thought of a way of salvaging my original approach. The computation of $~98/8~$ is accurate, except that (for example) it fails to adjust for the possibility of either 4 or 5 consecutive heads.

There are $~97~$ opportunities for $~4~$ consecutive heads, each of which has a probability of $~1/16~$ of occurring.

So, the first revision of my original computation is

$$\frac{98}{8} - \frac{97}{16}. \tag1 $$

Superficially, it would (wrongly) appear that the only thing wrong with the computation in (1) above is that it deducts each occurrence of 5 consecutive heads twice, rather than once.

So, a reasonable second revision is

$$\frac{98}{8} - \frac{97}{16} + \frac{96}{32} = \frac{294}{32} = \frac{147}{16} = 9.1875. \tag2 $$


The inaccuracy inherent in the computation in (2) above reflects that I am mis-managing exactly 7 consecutive heads, mis-managing exactly 8 consecutive heads, and so on. So, my overall approach must be altered.

For $~n \in \{3,4,5,\cdots,100\},~$ and for $~k \in \{1,2,\cdots, 101 - n\},~$ let $~f(n,k)~$ denote the probability of exactly $~n~$ consecutive heads occurring, with the first Head occurring at coin toss $~k.~$

This means that $~f(3,1) = 1/16,~$ since the first $~4~$ coin tosses must be HHHT. Similarly, $~f(3,2) = 1/32,~$ since the first $~5~$ coin tosses must be THHHT.

Each occurrence of exactly $~3,~4,~$ or $~5~$ consecutive heads is to be enumerated once. Each occurrence of exactly $~6,~7,~$ or $~8~$ consecutive heads is to be enumerated twice, and so forth.

So, using Linearity of Expectation, the overall expectation is

$$1 \times \left\{ ~\sum_{n=3}^5 \left[\sum_{k=1}^{101 - n} f(n,k)\right] ~\right\} \\ + 2 \times \left\{ ~\sum_{n=6}^8 \left[\sum_{k=1}^{101 - n} f(n,k)\right] ~\right\} \\ + 3 \times \left\{ ~\sum_{n=9}^{11} \left[\sum_{k=1}^{101 - n} f(n,k)\right] ~\right\} \\ + \cdots. \tag3 $$

The overall expression in (3) above can be attacked by finding a pattern in the computations, and then generalizing to a final formula.


$\underline{\text{Computation of} ~\displaystyle \sum_{k=1}^{101 - n} f(n,k) ~: ~n = 3}$

$~\displaystyle f(3,1) = \frac{1}{16}.$

For $~\displaystyle 2 \leq k \leq 97, ~f(3,k) = \frac{1}{32}.~$

$~\displaystyle f(3,98) = \frac{1}{16} ~$ (i.e. last 4 tosses must be THHH).

Therefore,

$$\sum_{k=1}^{101 - 3} f(3,k) = \frac{96}{32} + \left[2 \times \frac{1}{16} \right] = \frac{96 + 4}{32} = \frac{100}{32}.$$


$\underline{\text{Computation of} ~\displaystyle \sum_{k=1}^{101 - n} f(n,k) ~: ~n = 4}$

$~\displaystyle f(4,1) = \frac{1}{32}.$

For $~\displaystyle 2 \leq k \leq 96, ~f(4,k) = \frac{1}{64}.~$

$~\displaystyle f(4,97) = \frac{1}{32} ~$ (i.e. last 5 tosses must be THHHH).

Therefore,

$$\sum_{k=1}^{101 - 4} f(4,k) = \frac{95}{64} + \left[2 \times \frac{1}{32} \right] = \frac{95 + 4}{64} = \frac{99}{64}.$$


$\underline{\text{Computation of} ~\displaystyle \sum_{k=1}^{101 - n} f(n,k) ~: ~4 < n \leq 98}$

$~\displaystyle f(n,1) = \frac{1}{2^{n+1}}.$

For $~\displaystyle 2 \leq k \leq (100 - n), ~f(n,k) = \frac{1}{2^{n+2}}.~$

$~\displaystyle f(n,101 - n) = \frac{1}{2^{n+1}}.$

Therefore,

$$\sum_{k=1}^{101 - n} f(n,k) = \frac{99 - n}{2^{n+2}} + \left[2 \times \frac{1}{2^{n+1}} \right] = \frac{103 - n}{2^{n+2}}.$$


$\underline{\text{Final Computation}}$

(3) above may now be generalized to

$$1 \times \left[ ~\frac{100}{2^5} + \frac{99}{2^6} + \frac{98}{2^7} ~\right] \\ + 2 \times \left[ ~\frac{97}{2^8} + \frac{96}{2^9} + \frac{95}{2^{10}} ~\right] \\ + 3 \times \left[ ~\frac{94}{2^{11}} + \frac{93}{2^{12}} + \frac{92}{2^{13}} ~\right] \\ + \cdots \\ + 32 \times \left[ ~\frac{7}{2^{98}} + \frac{6}{2^{99}} + \frac{5}{2^{100}} ~\right] \\ + 33 \times \left[ ~\frac{2}{2^{100}} + \frac{1}{2^{100}} ~\right].$$

3
On

Let $z_n(m)$ be the expected number of non-overlapping runs of $n$ heads in the next $m$ coin flips. More generally, let $y_n(k, m)$ be the expected number of non-overlapping runs of $n$ heads in the next $m$ coin flips given that we need $0<k\le n$ heads to complete the current run. Then

  • $z_n(m)=0$ for $m < n$;
  • $z_n(m)=y_n(n, m)$;
  • for $k > 1$, $y_n(k, m)=\frac{1}{2}y_n(k-1, m-1) + \frac{1}{2}z_n(m-1)$; and
  • $y_n(1, m)=\frac{1}{2} + z_n(m-1)$.

These can be put together into a simple script:

   def z(n, m):
     if m < n: return 0
     return y(n, n, m)
   
   def y(n, k, m, cache={}):
     kk = (n, k, m)
     if kk in cache: return cache[kk]
     if k == 1:
       ret = Rational(1, 2) + z(n, m-1)
     else:
       ret = Rational(1, 2) * (y(n, k-1, m-1) + z(n, m-1))
     cache[kk] = ret
     return ret

When I evaluate $z_3(100)$, I get $$ \frac{8925295042423247826864542976627}{1267650600228229401496703205376} \approx 7.04082 . $$ This is between the lower bound of $33/8=4.125$ (considering $33$ non-overlapping runs of $3$, each with a probability $1/8$ of being "HHH") and the upper bound of $98/8=12.25$ (considering all possible runs of $3$, but allowing overlaps), as it should be.


Update:

It's actually a bit simpler to express the recursion in terms of $z_n(m)$ alone. For instance, for $n=3$, we have

$$ \begin{eqnarray} z_3(m) &=& y_3(3,m) \\ &=& \frac{1}{2}z_3(m-1) + \frac{1}{2}y_3(2,m-1) \\ &=& \frac{1}{2}z_3(m-1) + \frac{1}{4}z_3(m-2) + \frac{1}{4}y_3(1,m-2) \\ &=& \frac{1}{2}z_3(m-1) + \frac{1}{4}z_3(m-2) + \frac{1}{4}z_3(m-3) + \frac{1}{8}. \end{eqnarray} $$ More generally, $$ z_n(m) = \sum_{k=1}^{n} \frac{z_n(m-k)}{2^k} + \frac{z_n(m-n) + 1}{2^{n}}. $$ It seems like this should admit a generating-function solution.