A coin is tossed $10$ times. Find the probability that there exist $7$ consecutive coin tosses with at least $5$ out of the $7$ being heads.

319 Views Asked by At

A coin is tossed $10$ times. Find the probability that there exist $7$ consecutive coin tosses with at least $5$ out of the $7$ being heads.

So for example, $TTHHTHHTHH$ is one of the outcomes we want.

I guess the best way to treat this is as a counting problem; finding the number of outcomes we want and then dividing by $2^{10}.$

$2^{10} = 1024,$ so listing all the outcomes is time-wise expensive.

The events, "The first consecutive $7$ tosses contain at least $5$ heads", "The second consecutive $7$ tosses contain at least $5$ heads", etc. are not mutually exclusive and so the answer is not simply:

$P$(The first consecutive $7$ tosses contain at least $5$ heads) + $P$(The second consecutive $7$ tosses contain at least $5$ heads) + ... .

Similarly, the events, "The first consecutive $7$ tosses does not contain at least $5$ heads", "The second consecutive $7$ tosses does not contain at least $5$ heads", etc. are also not mutually exclusive and so the answer is not simply:

$1 - $ $[P$(The first consecutive $7$ tosses does not contain at least $5$ heads) + $P$(The second consecutive $7$ tosses does not contain at least $5$ heads) + ...$]$ .

Edit: What about reflecting the $10$ boxes down the middle? This could cut our work in half maybe? For example, $HTTHHTHTHH \equiv HHTHTHHTTH$. Perhaps figuring out the symmetries is expensive too.

I'm also interested in doing a similar problem with larger numbers, e.g.:

A coin is tossed $10^{14}$ times. Find the probability that there exist $1000$ consecutive coin tosses with at least $650$ out of the $1000$ being heads.

This is probably impractical to calculate using binomial distributions, so how would you find an answer using the Normal distribution as an approximation, or is it not possible to do this?

The reason I'm interested in this latter question is that it is the sort of calculation one might make if one wanted to gain statistical evidence that a poker site is rigged against them, although of course the latter question would not be enough evidence to prove a poker site is rigged against a particular player; it could be a reasonable starting point for further calculations. Also, it is not hard to imagine this calculation could have applications in other areas, statistical mechanics or mathematical biology for example.

5

There are 5 best solutions below

2
On

Treat the following as a long-comment: Inclusion-exclusion should work with the suggested events, namely, the event that is of interest is $P(A_1 \cup A_2 \cup A_3 \cup A_4)$ where $A_i$ indicates the tosses $(x_i, ..., x_{i+6})$ contain 5 Hs.

We note that $P(A_i)$ is $(\binom{7}{5} + \binom{7}{6} + \binom{7}{7}) \frac{1}{2^7}$. Now, one has to compute $P(A_i \cap A_{i+1}), P(A_i \cap A_{i+2})$, and $P(A_i \cap A_{i+3})$. For this, the best decision seems to be to condition on the number of heads in the intersections of their intervals, so for $P(A_i \cap A_{i+1})$, the intersecting interval of size 6 has to contain 4,5, or 6 heads, giving the form $P(A_i \cap A_{i+1}) = \binom{6}{4} \frac{1}{2^8} + \binom{6}{5}\frac{1}{2^6} + \binom{6}{6}\frac{1}{2^6}$. The other two are harder, but the same idea should work.

For the three-way and the last one, looking at the number of heads in spots (4,5,6,7), and then just placing an appropriate number of heads around might be the way to go. At that point, I would count.

EDIT: Glad this was done by another answerer; it looks fairly technical to actually do. Anyway, let me answer the second half of your question; your second question can be answered with bonferroni inequalities (think of cutting inclusion-exclusion); for the one you have there, the union bound already gives you that that is a one in a million event. (the probability of a 1000 tosses having 650 heads is $10^{-22} $ so multiply by $10^{14}$ for the union bound. If you wanted to find the probability of that 650 having heads using Gaussians, you can approximate $P(X > 650)$ by $P(2(X - 500)/\sqrt{1000} > 300/ \sqrt{1000})$ and take that first one to be a Gaussian with variance 1. (This gives a probability of $10^{-41}$ though, so I could be doing something wrong.)

0
On

Addendum added that shows the direct approach, as opposed to the initial answer which uses Inclusion-Exclusion.


As you analyzed, I will assume that the denominator is $(2^{10})$, so that the problem is reduced to a counting problem.

There are 4 sub-sequences to consider:
$S_1=[1-7], S_2=[2-8], S_3=[3-9], S_4=[4-10].$
I will use Inclusion-Exclusion by enumerating the number of ways that there is at least one good sequence, deducting back the number of ways that there are at least two good sequences, and so forth.

My final answer will be $\displaystyle T_1 - T_2 + T_3 - T_4$.

For $S_1$ to be good, it must contain either $5,6,$ or $7$ heads.
This can occur in $\binom{7}{0} + \binom{7}{1} + \binom{7}{2} = 29$ ways.
When examining $S_1$, there are then 3 coin tosses unconstrained.
This means that examination of $S_1$, by itself computes to
$29 \times 2^3 = 232.$

Further, when enumerating $T_1$ you have a choice of $\binom{4}{1}$ sequences to consider. Therefore,

$$T_1 = 232 \times 4 = 928.$$


The enumeration of $T_2$ will be trickier. Of the $\binom{4}{2}$ ways of selecting 2 of the 4 intervals, you know by symmetry that $S_1 + S_3$ will enumerate the same as $S_2 + S_4$. Also, $S_1 + S_2$ will enumerate the same as both $S_2 + S_3$, and $S_3 + S_4$. Therefore, the 6 partial sums will require 3 separate analyses.

These analyses will be represented by $A_1, A_2,$ and $A_3$.

$A_1: S_1 + S_2$
One possibility is a heads in both [1] and [8], and either 4, 5, or 6 heads in slots [2-7].
This computes to $\binom{6}{4} + \binom{6}{5} + \binom{6}{6} = 22.$
So, a partial sum for $A_1$ is
$A_{1a} = 22.$

The other possibility is that at least 1 of [1] and [8] is tails, and that there are either 5, or 6 heads in [2-7].
There are 3 ways of not having a heads in both [1] and [8].
There are $\binom{6}{5} + \binom{6}{6} = 7$ ways that 5 or 6 heads will occur in [2-7].
So, a 2nd partial sum for $A_1$ is
$A_{1b} = 3 \times 7 = 21.$

Further, when enumerating $A_1$, you have to consider that [9-10] are unconstrained. Therefore, $A_1 = (2^2) \times (22 + 21) = 172.$


$A_2: S_1 + S_3$

$A_{2a}:$ [1,2],[8,9] $=$ HHHH, [3-7] = 3-5 heads.
$A_{2a}: (1) \times (1 + 5 + 10) = 16.$

$A_{2b}:$ Both [1,2] and [8,9] have at least 1H each, but case $A_{2a}$ is excluded.
[3-7] = 4-5 heads.
$A_{2b}: (8) \times (1 + 5) = 48.$

$A_{2c}:$ At least one of [1,2] and [8,9] are all tails. [3-7] = 5 heads.
$A_{2c}: (7) \times (1) = 7.$

Further, when enumerating $A_2$, you have to consider that [10] is unconstrained. Therefore, $A_2 = (2^1) \times (16 + 48 + 7) = 142.$


$A_3: S_1 + S_4$

$A_{3a}:$ [1-3],[8-10] each have 3 heads. [4-7] has 2,3, or 4 heads.
$A_{3a} = (1) \times (1 + 4 + 6) = 11.$

$A_{3b}:$ [1-3],[8-10] each have at least 2 heads, but case $A_{3a}$ is excluded.
[4-7] has 3,4 heads.
$A_{3b} = (15) \times (1 + 4) = 75.$

$A_{3c}:$ [1-3],[8-10] each have at least 1 head, but cases $A_{3a}$ and $A_{3b}$ are excluded.
[4-7] has 4 heads.
$A_{3c} = (33) \times (1) = 33.$

$A_3 = (11 + 75 + 33) = 119.$


$T_2$ can now be enumerated as :

$$T_2 = (3 \times A_1) + (2 \times A_2) + A_3 = (3 \times 172) + (2 \times 142) + 119 = 919.$$


The enumeration of $T_3$ will also be tricky. Of the $\binom{4}{3}$ ways of selecting 3 of the 4 intervals, you know by symmetry that $S_1 + S_2 + S_3$ will enumerate the same as $S_2 + S_3 + S_4$. Also, $S_1 + S_2 + S_4$ will enumerate the same as $S_1 + S_3 + S_4$. Therefore, the 4 partial sums will require 2 separate analyses.

These analyses will be represented by $B_1,$ and $B_2$.

$B_1: S_1 + S_2 + S_3$
First, I will compute the number of ways that $S_1$ and $S_3$ are both satisfied. Then, I will adjust for the possibility that $S_1, S_3$ are satisfied, while $S_2$ is not. Only then, will I adjust for [10] being unconstrained.

The necessary intermediate analysis has already been done,
since $S_1 + S_3$ [without the $(2^1)$ scaling factor] was computed while computing $A_2$ in a previous section.
The incomplete running total for $B_1$ is
$A_{2a} + A_{2b} + A_{2c} = 71.$

In order for $S_1$ and $S_3$ to both be satisfied, and $S_2$ to be bad, the specific slots [1],[2],[8],[9] must be HTTH. Further, [3-7] must have exactly 4 heads. This can happen in $\binom{5}{1} = 5$ ways.

Therefore, the running total for $B_1$ is now $(71 - 5) = 66.$
To complete the computation of $B_1$ you must now apply the scalar of $(2^1)$, since [10] is unconstrained. Therefore,

$B_1 = (2^1) \times 66 = 132.$


$B_2: S_1 + S_2 + S_4$
First, I will compute the number of ways that $S_1$ and $S_4$ are both satisfied. Then, I will adjust for the possibility that $S_1, S_4$ are satisfied, while $S_2$ is not.

The necessary intermediate analysis has already been done,
since $A_3$ was computed in a previo0us section.
Therefore, the incomplete running total for $B_2$ is

$B_2 = (119)$.

In order for $S_1$ to be satisfied and $S_2$ to be bad, the specific slots [1],[8] must be HT.

In order for $S_4$ to be satisfied and $S_2$ to be bad, [9,10] must have at least one more heads than [2,3].

Putting this all together, the possibilities for [1,2,3,8,9,10] are
HTTTHH : [4-7] = 4 heads : (1) way.
HTHTHH : [4-7] = 3 heads : (4) ways.
HHTTHH : [4-7] = 3 heads : (4) ways.
HTTTHT : [4-7] = 4 heads : (1) way.
HTTTTH : [4-7] = 4 heads : (1) way.

Therefore, the final computation for $B_2$ is
$B_2 = 119 - 11 = 108$.

Therefore

$$T_3 = (2 \times B_1) + (2 \times B_2) = (2 \times 132) + (2 \times 108) = 480.$$


The easiest way to compute $T_4 = S_1 + S_2 + S_3 + S_4$ is to start with the previous section's analysis that $B_1$, which represents $S_1 + S_2 + S_3$, was computed to be $(132)$. Then, I will deduct from this the number of ways that $S_4$ is not satisfied, while each of $S_1, S_2,$ and $S_3$ are satisfied.

$S_4$ vs $S_3$ means that [3,10] must specifically be HT.
$S_4$ vs $S_2$ means that [2,3] must have at least one more H than [9,10]
$S_4$ vs $S_1$ means that [1,2,3] must have at least one more H than [8,9,10].

Putting this all together, the possibilities for [1,2,3,8,9,10] are
HHHHHT : [4-7] = 2 heads : (6) ways.
HHHTHT : [4-7] = 3 heads : (4) ways.
THHTHT : [4-7] = 3 heads : (4) ways.

HHHHTT : [4-7] = 3 heads : (4) ways.
HHHTTT : [4-7] = 4 heads : (1) way.
THHHTT : [4-7] = 3 heads : (4) ways.
THHTTT : [4-7] = 4 heads : (1) way.

HTHHTT : [4-7] = 3 heads : (4) ways.
HTHTTT : [4-7] = 4 heads : (1) way.
TTHTTT : [4-7] = 4 heads : (1) way.

Therefore, the final computation for $T_4$ is
$$T_4 = 132 - 30 = 102$$.


Putting this all together, the final computation is

$$T_1 - T_2 + T_3 - T_4 = 928 - 919 + 480 - 102 = 387.$$


Addendum
Although Inclusion-Exclusion is the preferred approach, because it generalizes so well, the (alternative) direct approach is not that bad, for this problem. Adopting the terminology in my answer:

Let $R_1$ denote the # of ways that sub-sequence $S_1$ can be satisfied.

Let $R_2$ denote the # of ways that sub-sequence $S_2$ can be satisfied where $S_1$ is not satisfied.

Let $R_3$ dentote the # of ways that sub-sequence $S_3$ can be satisfied where both $S_1$ and $S_2$ are not satisfied.

Let $R_4$ dentote the # of ways that sub-sequence $S_4$ can be satisfied where all of $S_1, S_2$ and $S_3$ are not satisfied.

Then the desired computation will be

$$R_1 + R_2 + R_3 + R_4.$$

Much of the analysis from the original answer will be repeated in the Addendum so that the Addendum will serve as a stand-alone answer.

For $R_1$, you need [1-7] to have at least 5 heads, with [8-10] unconstrained. Therefore,

$$R_1 = (2^3) \times \left[\binom{7}{0} + \binom{7}{1} + \binom{7}{2}\right] = 8 \times 29 = 232.$$


To compute $R_2$, you need [1,8] to be TH and [2-7] to have exactly 4 heads. Here, [9,10] are unconstrained. Therefore,

$$R_2 = (2^2) \times \binom{6}{4} = 4 \times 15 = 60.$$


To compute $R_3$, you need $S_3$ to be satisfied, and both $S_1$ and $S_2$ to be bad. This means that [2,9] must be [T,H] and [8,9] must have at least one more heads than [1,2]. Here, [10] will be unconstrained.

Putting this all together, the possibilities for [1,2,8,9] are
HTHH : [3-7] = 3 heads : (10) way.
TTHH : [3-7] = 3 heads : (10) ways.
TTTH : [3-7] = 4 heads : (5) ways.

Therefore,

$$R_3 = (2^1) \times (25) = 50.$$


To compute $R_4$, you need $S_4$ to be satisfied, and $S_1, S_2$ and $S_3$ must all be bad. This means that [3,10] must be [T,H] and [9,10] must have at least one more heads than [2,3] and [8,9,10] must have at least one more heads than [1,2,3].

Putting this all together, the possibilities for [1,2,3,8,9,10] are
HHTHHH : [4-7] = 2 heads : (6) ways.
THTHHH : [4-7] = 2 heads : (6) ways.
THTTHH : [4-7] = 3 heads : (4) ways.

HTTHHH : [4-7] = 2 heads : (6) ways.
TTTHHH : [4-7] = 2 heads : (6) ways.
HTTTHH : [4-7] = 3 heads : (4) ways.
TTTTHH : [4-7] = 3 heads : (4) ways.

HTTHTH : [4-7] = 3 heads : (4) ways.
TTTHTH : [4-7] = 3 heads : (4) ways.
TTTTTH : [4-7] = 4 heads : (1) way.

Therefore,

$$R_4 = 45.$$


Therefore,

$$R_1 + R_2 + R_3 + R_4 = 232 + 60 + 50 + 45 = 387.$$

2
On

This is a solution that works for $10$ tosses of coins. This is case by case counting.

i) Count of favorable arrangements $\displaystyle \small (H \gt 7) = 1 + 10 + \frac{10!}{8!2!} = \textbf{56} \ $ (for number of heads $ \gt 7$, all arrangements are favorable arrangements).

ii) Case $1$: $5$ Heads

Now if we take $\small HHHHH$ with $4$ places between them where $T$ can occur, we will have favorable arrangements with no $T$ between them, $1$ $T$ between them or $2$ between them.

Zero $T$: $6$ favorable arrangements ($\small HHHHH$ can occur starting at any toss other than $7$th toss onwards)
One $T$ (with $4$ places in between $5 H$): The block can occur at any toss starting from first place to fifth toss and there are $4$ possible places for $T$. So number of arrangements $ = 5 \cdot 4 = 20$.
Two $T$: Similarly, number of favorable arrangements = $\displaystyle \small 4 \cdot \big(\frac{4 \cdot 5}{2}\big) = 40$ favorable arrangements.

Total favorable arrangements for case $\small 1 = \textbf{66}$.

iii) Case $2$: $6$ Heads

Now if we take $\small HHHHHH$ with $5$ places between them where $T$ can occur, we will have favorable arrangements with any number of $T$'s ($0$ to $4 T$).

Zero $T$: $5$ favorable arrangements
One $T$: Number of arrangements $ = 4 \cdot 5 = 20$.
Two $T$: Number of favorable arrangements = $\displaystyle \small 3 \cdot \big(\frac{5 \cdot 6}{2}\big) = 45$ favorable arrangements.
Three $T$: Number of favorable arrangements = $\displaystyle \small 2 \cdot \big(\frac{5 \cdot 6 \cdot 7}{3!} - 10 \big) = 50$ favorable arrangements. There are $35$ arrangements with $3 T$ in between $HHHHHH$. But any arrangement of $3T$ in place of arrows in $HH \uparrow H \uparrow H \uparrow HH$ are not favorable arrangements. Those are $10$: either $3 T$ together in any of those $3$ places shown with arrows or $(2T, 1T)$ arrangements or $(T T T)$ arrangements.
Four $T$: $29$ favorable arrangements. Favorable arrangements occur when we have $4T$, $3T$ or $2T$ in place of one of the arrows in $H \uparrow H H H H \uparrow H$: $2$ arrangements with $4T$, $8$ arrangements with $3T$ ($3T$ in place of an arrow and remaining $T$ can occur in remaining $4$ places), $7$ arrangements with $(2T, 2T)$ ($8$ arrangements like last one but one duplicate when both $2T$ are in place of arrows) and $12$ arrangements with $(2T, T, T)$ [$2T$ in place of an arrow and choose $2$ places out of remaining $4$ for $(T, T)$].

Total favorable arrangements for case $\small 2 = \textbf{149}$.

iv) Case $3$: $7$ Heads

Total number of arrangements $ = \displaystyle \small \frac{10!}{7!3!} = 120$. The only arrangements that are not favorable are when we have $3T$ in place of an arrow or $(2T, T)$ in place of both arrows in $HHH \uparrow H \uparrow HHH$. That is $4$ arrangements.
So total favorable arrangements for case $\small 3 = \textbf{116}$.

Adding all of them, we have $56 + 66 + 149 + 116 = \fbox{387}$ favorable arrangements out of $1024$ possible arrangements.

1
On

This is difficult to solve in general. For not too big values of $M=$ length of the subsequences (here $M=7$) one can numerically compute the possible paths by keeping all the $2^{M-1}$ "states".

Here's an example in Java, which counts the complementary event (all subsequences must have less than $K=5$ heads).

It returns $637$ paths, which agrees with the other answers ($637=1024-387$)

import java.util.Arrays;

public class Mse4089130 {
    final int N; // 10
    final int M; // 7
    final int K; // 5 (sum must be less than K)
    final int S; // 2^(7-1) = 64
    final long[] st; // states (indexed from 0 to S-1
    int t = 0;

    public Mse4089130(int n, int m, int k) {
        N = n;
        M = m;
        K = k;
        S = (int) Math.pow(2, M - 1);
        st = new long[S];
        st[0] = 1;
    }

    public void compute() {
        for (int tt = 1; tt <= N; tt++) {
            t = tt;
            long[] stold = Arrays.copyOf(st, st.length);
            Arrays.fill(st, 0);
            for (int x = 0; x < S; x++) {
                st[evolve(x, false)] += stold[x];
                if (weight(x) < K - 1)
                    st[evolve(x, true)] += stold[x];
            }
        }
    }

    long count() {
        long ac = 0; for (long x : st) ac += x; return ac;
    }

    int evolve(int x, boolean one) {
        return one ? x / 2 + S / 2 : x / 2;
    }

    int weight(int n) {
        return Integer.bitCount(n);
    }

    public String toString() {
        return "[t=" + t + " count=" + count() + " N=" + N + ", 
          M=" + M + ", K=" + K + ", S=" + S + "]";
    }

    public static void main(String[] args) {
        Mse4089130 mse = new Mse4089130(10, 7, 5);
        System.out.println(mse.toString());
        mse.compute();
        System.out.println(mse.toString());
        System.out.println(mse.count());
    }
}
0
On

Others answers have sufficiently answered the $n=10$ case using PIE.

Let's generalize. Let $p(n,l,m)$ be the probability that in $n$ coin flips, that there is some set of consecutive $l$ flips with at least $m$ heads. You're large number question was to calculate $p(10^14, 1000,650)$.

To calculate this exactly, you'd naively need to do calculations on up to $2^{n-l+1}$ sets of good or not consecutive groups. You could probably narrow that down to some kind of recurrence doing dynamic programming on your current location and the last $l$ flips, but that still takes around $n*2^l$ calculations which is much too slow. Brute force is too slow.

Let's approximate. We can break up the $n$ coin flips into $n/l$ groups of $l$ flips. Within each group, the distribution of coin flips is approximately $N(l/2,\sqrt{l/2})$. Letting $F$ be the cdf of the unit normal distribution, this gives a probability of approximately $F(-(m-l/2)/\sqrt{l/4})$ of getting at least $m$ flips. Since each of these sets of $l$ flips are disjoint, they're independent, so the probability of any of them happening is around $q=1-(1-F(-(m-l/2)/\sqrt{l/4}))^{n/l}$. If $n/l q << 1$, then you can approximate the probability by the 1st order term, so you get a lower bound of around $n/l q$.

For an upper bound, there are $n-l+1$ groups of $l$ coins, each which will have too many heads with probability $q$, so the expected number of such groups is $(n-l+1)q$. The expected number is at least the probability of at least one group having at least $m$ coins, so this is an upper bound. $l$ is small, so this is at most $qn$.

Thus, we get that that $q\approx F(-(m-l/2)/\sqrt{l/4})$ and: $$1-(1-q)^{n/l} \leq p(n,l,m)\leq qn$$

For your numbers, $q \approx F(-150/\sqrt{500})\approx 10^{-21}$, so the upper bound is $p(10^14,1000,650)<10^{-7}$.

If you adjust things up a bunch, you can alternatively guarantee getting too many heads ($.9999<p(10^{22},1000,650)<1$).