For coin tosses the probability of head of tails is same and the coin flips are said to be independent. So, the probability of HHHH or TTTT or any combination of H and T should be same i.e. $2^{-n}$ But running a Monte Carlo Simulation shows that the events like all heads or all tails almost never happens for a sufficiently large $n$.
Explanations I have heard so far
I have heard the dilution effect of law of large numbers. But I would say thats just recursive explanation. Why are we assuming that the next batch of events would have a distribution closer to the natural distribution?
Another explanation I found, if we do not consider the position of head or tail, there are more ways to get a ~50% distribution. e.g. For two coin flips both $HT$ and $TH$ lead to natural distribution so, the probability of getting the natural distribution is 50%. While the probability of $HH$ or $TT$ is 25% each. But with this knowledge we can make an inference that if we have observed more $H$ than $T$ so far, then it is more likely that we are on a sequence which starts out with more $H$ and later adjusts with $T$s.
Can we say that the KL divergence between observed distribution and expected distribution approaches zero as the number of trials tends to infinity? If so, what is this invisible hand? Second law of thermodynamics? Is there a way to measure its push?
Other questions I have read so far
- Why don't previous events affect the probability of (say) a coin showing tails?
- Law of large numbers - almost sure convergence
- Is the Law of Large Numbers empirically proven?
- The Law of Large Numbers and the Probability of Bizarre Outcomes
- Gambler's fallacy and the Law of large numbers
- Betting: Gambler's Fallacy vs. Law of Large Numbers
- Bernoulli Trials: Law of Large Numbers vs Gambler's Fallacy, the N paradox
- Law of large numbers - almost sure convergence
For the (fair) coin experiment with $n$ independent tosses it holds: $$ \tag{1} \mathbb{P}(\text{a specific sequence is observed}) = 2^{-n}, $$ which tends to zero for large $n$. Note that this holds for any sequence (and not only for "HHH..H" and "TTT...T"). That is, any sequence, irrespective of the order (and number) of heads and tails, is extremely unlikely to be observed in the large $n$ limit. This fact has nothing to do with the law of large numbers.
The law of large numbers suggests the following (heuristically stated):
$$ \mathbb{P}(\text{a sequence is observed with number of heads} \approx n/2)\approx 1. $$ Note that the event considered here does not concern a sequence with a specific order of heads and tails (as was the case for the event considered in (1)) and does not contradict (1) as you seem to suggest in your question. The law of large numbers only states that out of all the (highly unlikely) sequences, the one that will actually be observed will have (with high probability) approximately $n/2$ heads. However, this "insight" cannot help you increase your chance of guessing the sequence of an experiment, since there is an extremely large number of sequences that fall in the category "a sequence with number of heads approximately equal to $n/2$" (in this case, roughly, $2^{n}$ such sequences exist).
With the same arguments, you can see that if the coin is biased (say, the probability of heads is $p>1/2$), the most probable sequence is the "all heads" sequence ($n$ heads), whereas the law of large numbers indicates that the observed sequence will most likely have approximately $np<n$ heads. Again, there is no contradiction here, since the probability of observing any specific sequence (even the most likely one) is extremely small in the large $n$ limit. Chances are that a sequence that is not the most likely one will be observed and the law of large number suggests that it will have approximately $np$ heads.