Is the Law of Large Numbers empirically proven?

Question

Is the Law of Large Numbers empirically proven?

16k Views Asked by Bumbble Comm At 15 May 2026 - 2:22

Does this reflect the real world and what is the empirical evidence behind this?

Wikipedia illustration

Layman here so please avoid abstract math in your response.

The Law of Large Numbers states that the average of the results from multiple trials will tend to converge to its expected value (e.g. 0.5 in a coin toss experiment) as the sample size increases. The way I understand it, while the first 10 coin tosses may result in an average closer to 0 or 1 rather than 0.5, after 1000 tosses a statistician would expect the average to be very close to 0.5 and definitely 0.5 with an infinite number of trials.

Given that a coin has no memory and each coin toss is independent, what physical laws would determine that the average of all trials will eventually reach 0.5. More specifically, why does a statistician believe that a random event with 2 possible outcomes will have a close to equal amount of both outcomes over say 10,000 trials? What prevents the coin to fall 9900 times on heads instead of 5200?

Finally, since gambling and insurance institutions rely on such expectations, are there any experiments that have conclusively shown the validity of the LLN in the real world?

EDIT: I do differentiate between the LLN and the Gambler's fallacy. My question is NOT if or why any specific outcome or series of outcomes become more likely with more trials--that's obviously false--but why the mean of all outcomes tends toward the expected value?

FURTHER EDIT: LLN seems to rely on two assumptions in order to work:

The universe is indifferent towards the result of any one trial, because each outcome is equally likely
The universe is NOT indifferent towards any one particular outcome coming up too frequently and dominating the rest.

Obviously, we as humans would label 50/50 or a similar distribution of a coin toss experiment "random", but if heads or tails turns out to be say 60-70% after thousands of trials, we would suspect there is something wrong with the coin and it isn't fair. Thus, if the universe is truly indifferent towards the average of large samples, there is no way we can have true randomness and consistent predictions--there will always be a suspicion of bias unless the total distribution is not somehow kept in check by something that preserves the relative frequencies.

Why is the universe NOT indifferent towards big samples of coin tosses? What is the objective reason for this phenomenon?

NOTE: A good explanation would not be circular: justifying probability with probabilistic assumptions (e.g. "it's just more likely"). Please check your answers, as most of them fall into this trap.

Original Q&A

There are 17 best solutions below

user76844 On 29 Jan 2015 - 3:56

Nice question! In the real word, we don't get to let $n \to \infty$, so the question of why LLN should be of any comfort is important.

The short answer to your question is that we cannot empirically verify LLN since we can never perform an infinite number of experiments. Its a theoretical idea that is very well founded, but, like all applied mathematics, the question of whether or not a particular model or theory holds is a perennial concern.

A more useful law from a statistical standpoint is the Central Limit Theorem and the various probability inequalities (Chebyshev, Markov, Chernov, etc). These allow us to place bounds on or approximate the probability of our sample average being far from the true value for a finite sample.

As for an actual experiment to test LLN. One can hardly do better than John Kerrichs 10,000 coin flip experiment-- he got 50.67% heads!!

So, in general, I would say LLN is empirically well supported by the fact that scientists from all fields rely upon sample averages to estimate models, and this approach has been largely successful, so the sample averages appear to be converging nicely for finite, and feasible, sample sizes.

There are "pathological" cases that one can construct (I'll spare you the details) where one needs astronomical sample sizes to get a reasonable probability of being close to the true mean. This is apparent if you are using the Central Limit Theorem, but the LLN is simply not informative enough to give me much comfort in day-to-day practice.

The physical basis for probability

It seems you still an issue with why long-run averages exist in the real world, apart from the theory of probability regarding the behavior of these averages assuming long-run averages exist. Let me state a fact that may help you:

Fact Nether probability theory nor the existence of a long-run averages requires randomness !

The determinism vs. indeterminism debate is for philosophers, not mathematics. The notion of probability as a physical observable comes from ignorance or absence of the detailed dynamics of what you are observing. You could just as easily apply probability theory to a boring 'ol pendulum as to the stock market or coin flips...its just that with pendulum's we have a nice, detailed theory that that allows us make precise estimates of future observations. I have no doubt that a full physical analysis of a coin flip would allow for us to predict what face would come up...but in reality, we will never know this!

This isn't an issue though. We don't need to assume a guiding hand nor true indeterminism to apply probability theory. Lets say that coin flips are truly deterministic, then we can still apply probability theory meaningfully if we assume a couple basic things:

The underlying process is $ergodic$...okay, this is a bit technical, but it basically means that the process dynamics are stable over the long term (e.g., we are not flipping coins in a hurricane or where tornados pop in and out of the vicinity!). Note that I said nothing about randomness...this could be a totally deterministic, albeit very complex, process...all we need is that the dynamics are stable (i.e., we could write down a series of equations with specific parameters for the coin flips and they wouldn't change from flip to flip).
The values the process can take on at any time are "well behaved". Basically, like I said earlier wrt the Cauchy...the system should not produce values that consistently exceed $\approx n$ times the sum of all previous observations. It may happen once in a while, but it should become very rare, very fast (precise definition is somewhat technical).

With these two assumptions, we now have the physical basis for the existence of a long-run average of a physical process. Now, if its complicated, then instead of using physics to model it exactly, we can apply probability theory to describe the statistical properties of this process (i.e., aggregated over many observations).

Note that the above is independent from whether or not we have selected the correct probability model. Models are made to match reality...reality does not conform itself to our models. Therefore, it is the job of the modeler, not nature or divine provenance, to ensure that the results of the model match the observed outcomes.

Hope this helps clarify when and how probability applies to the real world.

Bumbble Comm On 29 Jan 2015 - 3:57

The physical assumptions are that in each trial of tossing the coin, the coin is identical, and the laws of physics are identical, and the coin in no way "remembers" what it did before. With those assumptions, you can then say that there is some number between $0$ and $1$ that represents the probability of any given toss coming up heads.

Warning: that probability need not be $\frac{1}{2}$. In fact, a standard US penny will land on tails about 51% of the time.

Once you have that number, which we could call $p$, then it is meaningful to talk about the expected value of the number of heads arising in $1$ toss, which is that same $p$, and the expected value of the number of heads arising in $N$ tosses (the average result of $N$ trials) which is also $p$ because the tosses are completely independent.

Then the practical effect of LLN is to know that the likelihood of the average number of heads in an actual set of $N$ trials being "far" from $p$ becomes vanishingly small, provided that by "far" you mean more than a few times $\sqrt{1/N}$. And since for very large $N$, $\sqrt{1/N}$ becomes very small, we can say that with probability almost 1 the average of $N$ trials will lie in a small range about its in-principle value of $p$.

Bumbble Comm On 29 Jan 2015 - 4:09

One has to distinguish between the mathematical model of coin tossing and factual coin tossing in the real world.

The mathematical model has been set up in such a way that it behaves provably according to the rules of probability theory. These rules do not come out of thin air: They encode and describe in the most economical way what we observe when we toss real coins.

The deep problem is: Why do real coins behave the way they do? I'd say this is a question for physicists. An important point is symmetry. If there is a clear cut "probability" for heads, symmetry demands that it should be ${1\over2}$. Concerning independence: There are so many physical influences determining the outcome of the next toss that the face the coin showed when we picked it up from the table seems negligible. And on, and on. This is really a matter of philosophy of physics, and I'm sure there are dozens of books dealing with exactly this question.

Bumbble Comm On 29 Jan 2015 - 5:53

One has to distinguish between the mathematical model of coin tossing and the human intuition of it.

It is worthwhile to consider the following experiment.

A teacher divides his class into two groups. Then he gives a coin to each member of the one group. Each member of this group will flip his coin, say, 100 times. Everybody will jot down the results. The members of the other group will not have coins. They will simulate the coin flipping experiment by writing down imaginary results. Then everybody puts a secret mark on his paper. Finally the papers get shuffled and the children hand over the stack to the teacher. Surprisingly the teacher will be able to tell, with quite high certainty, who flipped coins and who just imagined the experiments. How? The average length of the consecutive heads (or tails) in the real experiments is way longer than in the case of the imaginary ones.

This demonstration, among other interesting examples, illustrates that the instinctive human understanding of random phenomena is quite unreliable.

So not only is it true that probability theory has nothing to do with reality, but it does not have anything to do with human intuition either. However, falsifying the predictions of probability theory is tiresome many times. (Validation of the same is impossible, of course. Regarding this latter feature probability theory is not special.)

Bumbble Comm On 29 Jan 2015 - 7:38

There is no physical law in play here, just probablities.

Assume that either result (heads or tails) is equally likely. For any number of flips in a trial, N, it is easy to compute the probability of getting H heads.

For N = 2, H(0) = 0.25, H(1) = 0.5, H(2) = 0.25 (Four possible outcomes, two of which are HT and TH)

For N = 6, H(0) = 0.016, H(1) = 0.094, H(2) = 0.234, H(3) = 0.313, H(4) = 0.234, H(5) = 0.094, H(6) = 0.016 (64 possible outcomes, 50 of which are 2H4T, 3H3T, 4H2T).

Notice that for 6 flips, that chance you will see 2, 3, or 4 heads is 78%. As N gets bigger, the probabilities of getting a number of heads in the vicinity of the halfway mark is very great, and the likelihood of seeing very many or very few heads will be very small.

There is no force pushing to the mean, it's just the probability that you're seeing one of the very unlikely outcomes is very very small. But again then you might see it someday.

Note that this is just a restatement of Erick Wong's answer.

Imagine that there are 2^N tables in a vast room, each with N coins laid out on the table in a unique combination. Each table has a chair and you are dropped from the ceiling into the room and land in a chair at one of the tables. That is the "trial" you just ran. Chances are that that table will have approximately N/2 heads. Remember that out of 2^N tables (e.g. for 1000 coins, there will be over 10^301 tables), there is only one with no heads.

user76844 On 29 Jan 2015 - 9:42

This isn't an answer, but I thought this group would appreciate it. Just to show that the behavior in the graph above is not universal, I plotted the sequence of sample averages for a Standard Cauchy distribution for $n=1...10^6$!. Note how, even at extremely large sample sizes, the sample average jumps around.

If my computer weren't so darn slow, I could increase this by another order of magnitude and you'd not see any difference. The sample average for a Cauchy Distribution behaves nothing like that for coin flips, so one needs to be careful about invoking LLN. The expected value of your underlying process needs to exist first!

enter image description here

Response to OP concerns

I did not bring this example up to further concern you, but merely to point out that "averaging" does not always reduce the variability of an estimate. The vast majority of the time, we are dealing with phenomena that possess an expected value (e.g., coin tosses of a fair coin). However, the Cauchy is pathological in this regard, since it does not possess an expected value...so there is no number for your sample averages to converge to.

Now, many moons ago when I first encountered this fact, it blew my mind...and shook my confidence in statistics for a short time! However, I've come to be comfortable with this fact. At the intuitive level (and as many of the posters here have pointed out) what the LLN relies upon is the fact that no single outcome can consistently dominate the sample average...sure, in the first few tosses the outcomes do have a large influence, but after you've accumulated $10^6$ tosses, you would not expect the next toss to change your sample average from, say, 0.1 to 0.9, right? It's just not mathematically possible.

Now enter the Cauchy distribution...it has the peculiar property that, no matter how many values you are currently averaging over, the absolute value of the next observation has a good (i.e., not vanishingly small - this part is somewhat technical, so maybe just accept this point) chance of being larger (much larger, in fact) than n times the sum of all previous values observed...take a moment to think about this, this means that at any moment, your sample average can be converging to some number, then WHAM, it gets shot off in a different direction. This will happen infinitely often, so you're sample average will never settle down like it does with processes that possess an expected value (e.g., coin tosses, normally distributed variables, poisson, etc.). Thus, you will never have an observed sum and an $n$ large enough to swamp the next observation.

I've asked @sonystarmap if he/she would mind calculating the sequence of medians, as opposed to the sequence of averages in their post (similar to my post above, but for 100x more samples!) What you should see is that the median of a sequence of Caychy random variables does converge in LLN fashion. This is because the Cauchy, like all random variables, does possess a median. This is one of the many reasons I like using medians in my work, where Normality is almost surely (sorry, couldn't help myself) false and there are extreme fluctuations. Not to mention the sample median minimizes the average deviation from the mean, when it does exist.

Second Addition: Cauchy DOES have a Median

To add another detail (read:wrinkle) to this story, the Cauchy does have a median, and so the sequence of medians does converge to the true median (i.e., $0$ for the standard Cauchy.) To show this, I took the exact same sequence of standard cauchy variates I used to make my first graph of the sample averates, and then took the first 20,000 and broke it up into four intervals of 5000 observations each (youll see why in a moment). I then plotted the sequence of sample medians as the samep size approaches 5000 for each of the four independent sequence. Note the dramatic difference in convergence properties!

This is another application of the law of large numbers, but to the sample median. Details can be seen here.

enter image description here

Bumbble Comm On 29 Jan 2015 - 10:28

It looks like most of the answers are addressing the apparent (but maybe not actual) misunderstanding behind your question. I will try to give a more direct mathematical explanation. I know you said to "avoid abstract math," so I will try to explain what I'm doing.

Suppose we have a random variable $X$. Basically, this is an abstraction of a random or unpredictable event. It has multiple possible values, each with a probability that it is the result. We calculate the expected value of $X$, or $E(X)$, by multiplying each possible result by its probability and adding them together. This is the same as the sample mean, $\mu$.

We can also determine how "spread out" the possible values are, by calculating the variance. The variance, $\sigma^2$, is the expected value of the square of the deviation from the mean, which is how far the random variable is from its expected value. That is, the deviation is $X-\mu$, and the variance is $\sigma^2=E\left((X-\mu)^2\right)$. We also have standard deviation $\sigma$, which is the square root of the variance.

Intuitively, we can say that "most" of the time, the result of a random test will be "close" to the expected value. If we know the random variable's variance, we can define "close" in terms of the variance or standard deviation and make this a mathematical statement. In particular, $$P(|X-\mu|\ge k\sigma)\le\frac{1}{k^2}$$ This is Chebyshev's Inequality, and it says that the probability that a random variable is $k$ or more standard deviations from the mean is less than or equal to $1/k^2$. While this exact result might not be obvious, but the idea should be clear: If there were more likely outcomes farther away, then the variance would be higher. From this, we can prove the (weak) Law of Large Numbers.

Let us take $n$ independent random variables $X_1,X_2,\ldots,X_n$ with the same distribution, with finite mean $\mu$ and finite variance $\sigma^2$, and define their average as $\overline{X}_n=\frac{1}{n}(X_1+\ldots+X_n)$. Then $E(\overline{X}_n)=\mu$, and $Var(\overline{X}_n)=Var(\frac1n(X_1+\ldots+X_n))=\frac{\sigma^2}{n}$

Obviously, for any positive real number $\epsilon$, $|\overline{X}_n-\mu|$ is either greater than, less than, or equal to $\epsilon$. There are no other possibilities, so \begin{equation} P(|\overline{X}_n-\mu|<\epsilon)+P(|\overline{X}_n-\mu|\ge\epsilon)=1\\ P(|\overline{X}_n-\mu|<\epsilon)=1-P(|\overline{X}_n-\mu|\ge\epsilon) \end{equation} Then, applying Chebyshev's Inequality (substituting $k=\frac{\epsilon}{\sigma}$) gives $$P(|\overline{X}_n-\mu|<\epsilon)\ge1-\frac{\sigma^2}{n\epsilon^2}$$ So as we take more trials, that is, as $n\to\infty$, this lower bound approaches $1$. And since probabilities cannot be greater than $1$, we have $$\lim_{n\to\infty}P(|\overline{X}_n-\mu|<\epsilon)=1$$ Or, equivalently, $$\lim_{n\to\infty}P(|\overline{X}_n-\mu|<\epsilon)=0$$ This is the Weak Law of Large Numbers.

It is important for me to point out, for general understanding, that having a probability of $0$ is not quite the same thing as being literally impossible. What it means is that almost all tests (in a mathematical sense) will fail. In this case, there are an uncountably infinite number of infinite sets of random variables, but there are only countably many sets whose average differs from the expected value.

Bumbble Comm On 30 Jan 2015 - 12:21

Based on your remarks, I think you are actually asking

"Do we observe the physical world behaving in a mathematically predictable way?"

"Why should it do so?"

Leading to:

"Will it continue to do so?"

See for example Philosophy stack exchange question.

My take on the answer is that, "Yes", for some reason the physical universe seems to be a machine obeying fixed laws, and this is what allows science to use mathematics to predict behaviour.

So, if the coin is unbiased and the world behaves consistently then number of heads will vary in a predictable way.

But please note that it is not expected to converge to exactly half. In fact, the excess or deficit will go as $\sqrt N$, which actually increases with $N$. It is the proportion of the excess relative to the total number of trials $N$ which goes to zero.

However, no-one can ever prove in principle whether, for example, the universe actually has a God who decides how the coin will fall. I recall that in Peter Bernstein's book about Risk the story is told that the Romans (who did not know probability as a concept) had rules for knucklebone based games that effectively assumed this.

Finally, if you ask which state of affairs is "well supported by evidence", the evidence available would include at least all of science and the finance industry. That's enough for most of us.

Bumbble Comm On 30 Jan 2015 - 8:11

I think it is very helpful to redefine the Law of Large Numbers:

Wikipedia gives it as follows:

According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.

However it's important to note that the law isn't necessarily describing a physical law, rather a mathematical law. It would be better stated as:

As more trials are performed, the probability that it will deviate from the mean will get smaller and smaller.

In other words within the mathematical framework of probabilities, a larger set has a larger probability of being closer to the mathematical probabilistic mean.

What seems to be bothering the OP is why any probabilities have a bearing on the physical world. As commented above, frequentist probability is just a description of possible outcomes and their ratios - it never explains why or what physical law keeps the world in sync with such a law. The OP's question is more a physics/philosophy question, (one that has bothered me for ages). It reminds me of the Is-ought problem.

As an example, given an infinite number of universes, there will be one universe where all random events follow the most unlikely probability. The poor fellow living in such a universe would be best off always taking the worst odds. Why should we assume that we are in a universe that happens to be one which will follow the most probable outcome. (Of course one will argue that according to probabilities, we should find ourselves in the universe that is closer to the mean. I just mean this as an example to bring out the problem of there being no physical necessity for the Law of Large Numbers to be true in real life.)

This is not the same as "why do physics stay the same" - even if we accept that the speed of light is constant, and that mass creates gravity, it's a much bigger stretch to say that there is a general physical law that minds probabilities in the real world, always keeping them in sync with mathematics. The difference is that the others laws apply in any given physical situation - mass will always create gravity, etc. Whereas probability by definition allows for variation - just claiming that the mean will eventually add up. (As I argued before, it really doesn't even claim this.) (From studying quantum physics and uncertainty, it really does seem as if the universe corrects itself over large samples of purely random events to match the mean).

Edit: I've found that the problem described - the empirical/logical meaning of probabilities - has already been addressed by David Hume in An Enquiry Concerning Human Understanding, Section VI: of Probability, and at length by Henri Poincaré in Science and Hypothesis. (An additional resource, though in Hebrew, is Sha'arei Yosher 3.2.3)

Bumbble Comm On 30 Jan 2015 - 8:34

Consider coin tosses. The strong law of large numbers says that if the coin tosses are independent and identically distributed (iid.), then for almost any experiment, the averages converge to the probability of a head.

The degree to which the result is applicable in the 'real' world depends on the degree to which the assumptions are valid.

Both independence and identically distributed are impossible to verify for real systems, the best we can do is to convince ourselves empirically by many observations, symmetry in the underlying physics, etc. (As a slightly related aside, sometimes serious mistakes are made, for example, read the LTCM story.)

The iid. assumption ensures that no experiment is favoured. For example, in a sequence of $n$ coin tosses, there are $2^n$ experiments and each is 'equi-probable'. It is not hard to convince yourself that for large $n$ the percentage of experiments whose average is far from the mean becomes very small. There is no magic here.

I think a combination of the central limit theorem and the observed prevalence of normal distributions in the 'real' world provides stronger empirical 'evidence' that the iid. assumption is often a reasonable one.

Bumbble Comm On 30 Jan 2015 - 11:23

Please also consider this : Most human games are flawed. The head or tails depends on the coin and the way it is thrown. One man throwing the same coin will probably have something far from 50-50, be it because he's a cheater or just put always the same force on the same side, making the coin flip the same number of times in the air.

But let's say now that you are considering different people with different hands, then you'll very likely to hit near 50-50 quite quick.

When playing the lottery, some people think they should play numbers that don't come up as often as others, as the LLN will "have" to make them appear more often now to compensate. This is twice wrong.

As one already said, the law should not be understood as a magic hand that compensates for the first inequities. It just keeps a 50% chance on every try, and the first mistakes will just "dilute" into the number. There is no statistical reason to look at the previous throws, they don't impact the future ones.
The practical case is even worse : since the coin (or the lottery balls) is not perfect, this imperfection will likely play the same role every time, making the same result more probable. So the truth in lottery is to play precisely the numbers that already won !

Of course, knowing that, the lottery guys are changing balls now and then...

Bumbble Comm On 30 Jan 2015 - 6:30

Perhaps, a better way to understand the concept is to compute the probability of many trials coming out balanced. For example, if we flip a coin 10 times then the probability that the number of heads/tails will be within 10% of each other is only 24.6%. However, as we flip the coins more times the probability that the number of heads/tails will be close to each other (within 10%) increases:

100 trials: 38.3%

1000 trials: 80.5%

10,000 trials: 99.99%

Thus, there is no need to stipulate a "law", we can simply compute the probability of balance occurring and see that it increases as we do more trials. Note that there is always a chance of imbalance occurring. For example, after 10,000 coin flips there is a 0.007% chance that the number of heads will not be within 10% of the count of tails.

Bumbble Comm On 30 Jan 2015 - 9:27

Suppose you've tossed a fair coin ten times, and it has been heads nine times out of ten, for an observed $\frac{\mathrm{heads}}{\mathrm{flips}} = 0.9$. There is a 50% chance that the next toss will be heads, making 10/11 heads, and a 50% chance that the next toss will be tails, making 9/11 heads. The expected fraction of heads after the next toss is then $0.5 \frac{10}{11} + 0.5 \frac{9}{11} = \frac{19}{22} \approx 0.864$, which is closer to 0.5 than 0.9 is.

It's pure math. Given a fair coin with no memory, if the fraction of heads up until now is 0.5, then the expected number of heads after one more toss will remain 0.5. Otherwise, the expected number of heads after one more toss will become closer to 0.5. It doesn't take any physical effect, just the fact that every flip increases the denominator of your fraction, but only half of the flips will reinforce any "excess" number of heads or tails.

Bumbble Comm On 30 Jan 2015 - 10:57

There are plenty of correct answers here. Let me see if I can make the correct answer dead-simple.

The Gamblers Fallacy is the belief that a past trend in random events will tend to be balanced by an opposite trend in future random events. "If the last 10 coin flips have been heads, the next coin flip is more likely to be tails."

The Law of Large numbers is the observation that regardless of the nature or pattern of the variation, as your sample size gets larger, the significance of the variation (whether positive or negative) gets smaller. "If the last 10 coin flips have all been heads, that has a significant impact on the average of a sample of 50, but an insignificant impact on the average of a sample of 50,000"

Bumbble Comm On 31 Jan 2015 - 2:26

It seems to me that the core of your question has nothing to do with the Law of Large Numbers and everything to do with why the physical universe behaves in the ways that mathematics predicts.

You might as well ask this: Whenever I have two of something in my left hand and three of something in my right hand, I find that I have five of that something altogether. I understand that mathematics predicts this, but why should the Universe obey?

Or: Mathematics tells me that for any numbers x and y, if I have x piles of stones with y stones in each pile, and you have y piles of stones with x in each pile, then we'll each have the same number of stones. What's the empirical evidence for this law? Why should we expect the Universe to behave this way just because mathematics says it should?

I don't know what answers to these questions you'd consider satisfactory, but I think you'll gain some insight if you concentrate on these much simpler questions, where the fundamental issues are exactly the same as in the question you're asking.

Bumbble Comm On 04 Feb 2015 - 8:24

Strong Mathematical explanation.

First I present another experiment which, in my sense, will be of your interest.

Let $x_1,x_2, \cdots$ be an infinite sample obtained by observation on independent and normally distributed real-valued random variables with parameters $(\theta,1)$, where $\theta$ is an unknown mean and the variance is equal to $1$. Using this infinite sample we want to estimate an unknown mean. If we denote by $\mu_{\theta}$ a linear Gaussian measure on ${\bf R}$ with the probability density $\frac{1}{\sqrt{2\pi}}e^{-\frac{(x-\theta)^2}{2}}$, then the triplet $$({\bf R}^N,\mathcal{B}({\bf R}^N),\mu_{\theta}^N)_{\theta \in R}$$ will be a statistical structure described our experiment, where ${\bf R}^N$ is a Polish topological vector space of all infinite samples equipped with Tychonoff metric and $\mathcal{B}({\bf R}^N)$ is the $\sigma$-algebra of Borel subsets of ${\bf R}^N$. By virtue of the Strong Law of Large Numbers we have $$ \mu_{\theta}^N(\{(x_k)_{k \in N}: (x_k)_{k \in N}\in {\bf R}^N~\&~\lim_{n \to \infty}\frac{\sum_{k=1}^nx_k}{n}=\theta\}=1 $$ for each $\theta \in {\bf R}$, where $\mu_{\theta}^N=\mu_\theta \times \mu_\theta \times \cdots$.

We must wait that by our infinite sample $(x_k)_{k \in N}$ and by the consistent estimator $\overline{X}_n= \frac{\sum_{k=1}^nx_k}{n}$ when $n$ tends to $\infty$, we get a "good" estimation of the unknown parameter $\theta$. But let look to the set $$ S=\{ (x_k)_{k \in N}: (x_k)_{k \in N}\in {\bf R}^N~\&~\mbox{exists a finite limit} \lim_{n \to \infty}\frac{\sum_{k=1}^nx_k}{n}\}. $$ It is a proper vector subspace of $R^N$ and hence is "small"(more precisely, is Haar null set in the sense of Christensen(1973)). This means that our 'good" statistic is not defined on the complement of $S$ which is a "big" set(more precisely, is prevalent in the sense of Christensen(1973)).

This means that for "almost every"(in the sense of Christensen) our "good statistic"-sample average $\overline{X}_n$ has no limit.

Now let $x_1,x_2, \cdots$ be an infinite sample obtained by coin tosses. Then the statistical structure described this experiment has the form: $$ \{(\{0,1\}^N,B(\{0,1\}^N),\mu_{\theta}^N): \theta \in (0,1)\} $$ where $\mu_{\theta}(\{1\})=\theta$ and $\mu_{\theta}(\{0\})=1-\theta$. By virtue of the Strong Law of Large Numbers we have $$ \mu_{\theta}^N(\{(x_k)_{k \in N}: (x_k)_{k \in N}\in \{0,1\}^N~\&~\lim_{n \to \infty}\frac{\sum_{k=1}^nx_k}{n}=\theta\})=1 $$ for each $\theta \in (0,1)$. Note that $G:=\{0,1\}^N$ can be considered as a compact group. Since the measure $\mu_{0,5}^N$ coincides with the probability Haar measure $\lambda$ on the group $G$, we deduce that the set $A(0,5)=\{(x_k)_{k \in N}: (x_k)_{k \in N}\in \{0,1\}^N~\&~\lim_{n \to \infty}\frac{\sum_{k=1}^nx_k}{n}=0,5\}$ is prevalence. Since each $A(\theta) \subset G \setminus A(0,5)$ for $\theta \in (0;1)\setminus \{1/2\}$, where $$A(\theta)=\{(x_k)_{k \in N}: (x_k)_{k \in N}\in \{0,1\}^N~\&~\lim_{n \to \infty}\frac{\sum_{k=1}^nx_k}{n}=\theta\},$$ we deduce that they all are Haar null sets.

My answer to the Question: "Why is the universe NOT indifferent towards big samples of coin tosses? What is the objective reason for this phenomenon?" is the following: The set of infinite samples $ (x_k)_{k \in N}\in G:=\{0,1\}^N$ for which exist limit of sample average $\overline{X}_n$ when $n$ tends to $\infty$ and is equal to $0,5$ is a prevalent in the sense of Christensen(1973), equivalently, has full Haar $\lambda$-measure. Hence, the Strong Law of Large Numbers is not empirically proven.

**Bumbble Comm** · Accepted Answer

Reading between the lines, it sounds like you are committing the fallacy of the layman interpretation of the "law of averages": that if a coin comes up heads 10 times in a row, then it needs to come up tails more often from then on, in order to balance out that initial asymmetry.

The real point is that no divine presence needs to take corrective action in order for the average to stabilize. The simple reason is attenuation: once you've tossed the coin another 1000 times, the effect of those initial 10 heads has been diluted to mean almost nothing. What used to look like 100% heads is now a small blip only strong enough to move the needle from 50% to 51%.

Now combine this observation with the easily verified fact that 9900 out of 10000 heads is simply a less common combination than 5000 out of 10000. The reason for that is combinatorial: there is simply less freedom in hitting an extreme target than a moderate one.

To take a tractable example, suppose I ask you to flip a coin 4 times and get 4 heads. If you've flip tails even once, you've failed. But if instead I ask you to aim for 2 heads, you still have options (albeit slimmer) no matter how the first two flips turn out. Numerically we can see that 2 out of 4 can be achieved in 6 ways: HHTT, HTHT, HTTH, THHT, THTH, TTHH. But the 4 out of 4 goal can be achieved in only one way: HHHH. If you work out the numbers for 9900 out of 10000 versus 5000 out of 10000 (or any specific number in that neighbourhood), that disparity becomes truly immense.

To summarize: it takes no conscious effort to get an empirical average to tend towards its expected value. In fact it would be fair to think in the exact opposite terms: the effect that requires conscious effort is forcing the empirical average to stray from its expectation.

Is the Law of Large Numbers empirically proven?

Does this reflect the real world and what is the empirical evidence behind this?

There are 17 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in APPLICATIONS

Related Questions in LAW-OF-LARGE-NUMBERS

Trending Questions

Popular # Hahtags

Popular Questions