Here is a problem I thought of:
- Suppose I am watching someone flip a fair coin. Each flip is completely independent from the previous flip.
- I watch this person flip 3 consecutive heads.
- I interrupt this person and ask the following question: If the next flip results in a "head", I will buy you a slice of pizza. If the next flip results in a "tail", you will buy me a slice of pizza.
My Question: Who has the better odds of winning?
I wrote the following simulation using the R programming language. In this simulation, a "coin" is flipped many times ("1" = HEAD, "0" = TAILS). We then count the percentage of times HEAD-HEAD-HEAD-HEAD appears compared to HEAD-HEAD-HEAD-TAILS:
#load library
library(stringr)
#define number of flips
n <- 10000000
#flip the coin many times
flips = sample(c(0,1), replace=TRUE, size=n)
#count the percent of times HEAD-HEAD-HEAD-HEAD appears
str_count(paste(flips, collapse=""), '1111') / n
0.0333663
#count the percent tof times HEAD-HEAD-HEAD-TAIL appears
str_count(paste(flips, collapse=""), '1110') / n
0.062555
From the above analysis, it appears as if the person's luck runs out: after 3 HEADS, there is a 3.33% chance that the next flip will be a HEAD compared to a 6.25% chance the next flip will not be a HEAD (i.e. TAILS).
Thus, could we conclude: Even though the probability of each flip is independent from the previous flip, it becomes statistically more advantageous to observe a sequence of HEADS and then bet the next flip will be a TAILS? Thus, the longer the sequence of HEADS you observe, the stronger the probability becomes of the sequence "breaking"?
Thanks
You are misled by a technical issue: it looks like the
str_countcommand in R counts the number of non-overlapping matches it finds. Therefore it will not correctly count all the times that three flips of "heads" are followed by a fourth.For example, in the sequence
(where
1represents "heads" and0represents "tails"), thestr_countcommand will tell you that1111only appears once, but1110appears twice. But in fact, if you look at the five times that111appears in the string (including overlaps), then it is followed by another1three times, and by0only twice.Over many coin-flips, if you count all instances of
1111or1110and allow overlaps, you should see each one about $\frac1{16} = 0.0625$ of the time. So three "heads" are equally likely to be followed by "heads" or "tails".(Note that this does not mean that after three "heads", there is a $0.0625$ probability of "heads" and a $0.0625$ probability of "tails": there is a $\frac12$ probability of each. The $0.0625$ counts the probability of all four flips happening.)