Why does experimental probability approach theoretical probability? Why does it converge only when there are large samples and not when it's small?

1.1k Views Asked by At

I went through Khan Academy's lecture on theoretical and experimental probability. I also read through a Wikipedia article on this but was still not clear. I understand how it approaches (as explained in the video) but unable to understand why experimental probability approaches theoretical probability. What is the reason for this?

I think that the general sense is, if I take a large enough sample, I am going to end up getting the expected mean of the sample. The more experiments I do, the more it converges. Sure, I get that. But why does it converge only when there are large samples and not when it's small?

1

There are 1 best solutions below

4
On BEST ANSWER

You think you are "unable to understand why experimental probability approaches theoretical probability". We all are. Experimental probability means really throwing physical coins or needles, or picking colored balls from an urn, and counting the various outcomes.

On the other hand probability theory is a mathematical edifice with the purpose to talk coherently about events and processes considered "random". Take throwing a coin as an example. At the beginning we only postulate that in a single throw we see $H$ or $T$ with equal probabilities ${1\over2}$, whatever that means. We then create the idea of independence. This entails that when throwing the coin $n$ times all $2^n$ binary strings over $\{H,T\}$have the same probability ${1\over2^n}$. In this model we don't know which string we shall observe, but we can prove that with high probability we see about ${n\over2}$ $H$s. This means that the model behaves in the way we intuitively think about probabilities. But the model then also has answers to more difficult questions, e.g., how often will we (on average) have to throw the coin in order to see a run of $10$ $H$s.

Concerning "convergence": This notion by its very name requires large numbers of "experiments" within our model. One then, e.g., proves that, for an infinite sequence of coin throws, with probability $1$ the fraction of $H$s converges to ${1\over2}$. Again: Why this seems to be the case also when you throw a coin $10^6$ times in your lab, nobody knows.

But maybe other people think differently. There is a large literature about the "philosophy of probability".