Let's assume we are given a coin with no information about its bias ($P(H)$ is unknown) or independence ($P(H|T)$ may or may not be equal to $P(H)$, we only know that $P(H) + P(T) = 1$. Let's assume we toss the coin a sufficient number of times and are asked to estimate $P(H|T)$. That is the probability of getting heads after getting tails. How should we calculate the above probability?
- Can we group the outcomes in exclusive pairs or should we apply a rolling window of two to the sequence?
The results will be different based on the above decision. Assume, we see the following sequence: ....HHTT..... if we group HH and TT separately, the count of HT will be different than if we estimate the counts on a rolling basis (HH, HT, TT).
- Why doesn't the law of conditional probabilities $P(A|B) =\frac{P(A, B)}{P(B)}$ apply here?
Hi: If you want to estimate $P(H | T)$, this is equal to (the number of times that an H occurred immediately after a T) divided by (the number of times that a T occurred). Therefore, it is equal to the number of times you get an TH sequence divided by the number of times you get a T.
so, if you had T, H, T, T, H, the estimate would be 2/3.
In your specific example with H, H, T, T, it would be 0/1 = 0 because you can't count the last T because you didn't see the one after the last T.