I was at a restaurant with some of my friends who study pure math. While at the restaurant, they were having a discussion about the relationship between Measure Theory (https://en.wikipedia.org/wiki/Measure_(mathematics)) and Probability Theory (https://en.wikipedia.org/wiki/Probability_theory). I tried to follow along the discussion but I unfortunately could not.
Prior to the conversation, here is what I already knew:
I already loosely know what Probability Theory is - from the undergraduate level courses I took in university, I learned about different concepts from Probability Theory in a very applied way. This included learning about different mathematical properties of Probability Distribution Functions and Random Variables.
I am less familiar with Measure Theory. Based on some readings I have done, it seems like Measure Theory is involved with assigning "quantities" to "subsets of a set". For example, if you flip 2 coins - the sample space is HH, HT, TH, TT. In this case, a "Measure" refers to a probability - and by using a special function called a "Probability Measure", we assign probabilities to each element in the sample space.
The mathematician Andrey Kolmogorov (https://en.wikipedia.org/wiki/Andrey_Kolmogorov) created a set of Mathematical Axioms within Probability Theory (https://en.wikipedia.org/wiki/Probability_axioms). For any given "experiment" (i.e. Measure Space) - there must be a sample space, an event space (i.e. an "event" corresponds to a given subset of the sample space) and a probability measure which assigns probabilities to each event within the sample space. Kolmogorov's Axioms tell us that non-negative probabilities are not possible, the probability of at least one event occurring is 1 and the sum of the probabilities for all (disjoint) events is 1. These Axioms then allow us to derive important rules that can be used to further analyze and interpret probabilities, e.g. P(A U B) = P(A) + P(B) - P(A & B)
Now, here is what I did not understand in the conversation:
Supposedly Kolmogorov's Axioms were so important that they revolutionized the field of Probability Theory. Kolmogorov was the first to explicitly describe the relationship between Probability Theory and Measure Theory.
But why exactly were these Axioms so important? How exactly did the relationship between Measure Theory and Probability Theory revolutionize Probability Theory?
If I understand things correctly, it seems like the field of Probability Theory made significant progress before Kolmogorov was even born. For example, the Normal Distribution (https://en.wikipedia.org/wiki/Normal_distribution) was defined by Gauss - far before the birth of Kolmogorov. On the other hand, important results in Probability Theory such as Chebyshev's Inequality (https://en.wikipedia.org/wiki/Chebyshev%27s_inequality) and Markov's Inequality (https://en.wikipedia.org/wiki/Markov%27s_inequality) were also defined before Kolmogorov. Thus, if the relationship between Measure Theory and Probability Theory is so important - how were these results possible when this relationship was not defined?
In other words: What "things" could not have been done prior to defining this relationship between Probability Theory and Measure Theory? And what "things" could now be done after defining this relationship between Probability Theory and Measure Theory?
To summarize - why is the relationship between Measure Theory and Probability Theory important?
Can someone please help me understand these points?
Thanks!
I think Durrett is a great reference for understanding the connection here. The organization of his textbook neatly lays out the relationship:
Chapter 1 starts with the pure measure theory side of probability. A probability distribution $(\Omega, \mathcal{F}, \Bbb{P})$ is a measure space, having the probability that the measure of the entire space is finite and equal to 1: $\Bbb{P}(\Omega) = 1$. From there he defines random variables (i.e. integrable functions $X: \Omega \to S$, where the codomain space $S$ is often $\Bbb{R}^d$ with the Lebesgue measure) and expectation (i.e. the integrals $\Bbb{E}X := \int X d \Bbb{P}$ of these random variables), with a review of the basic measure theory results (e.g. Fatou's lemma) needed to proceed.
My favorite line in this entire book kicks off Chapter 2 on Laws of Large Numbers: "Measure theory ends and probability begins with the definition of independence." Durrett walks us through the definitions of independent events, independent $\sigma$-algebras, and independent r.v.'s, and looks at classic results such as weak & strong LLN, as well as modes of convergence of random variables and random series.
So essentially, probability is looking at a specific type of measure space (a finite one, whose measure is normalized to equal $1$), but the nomenclature is tailored to the applications: we speak of "events" instead of "measurable sets", "random variables" instead of "measurable functions", etc. and we also take an interest in independence relationships between events/r.v.'s/etc. which may not apply to more general measure spaces (particularly infinite ones).