How is it possible to teach probability theory without sigma fields?

341 Views Asked by At

In introductory undergraduate probability courses, even those with a focus on set theory, I've often seen the definition of a sigma field entirely skipped over. Indeed, I've often seen the definition of a probability space include the claim "where $\mathcal{F}$ is the set of all events". Now, I've got no doubt that sigma fields are necessary in probability theory, so I'm forced to ask. How is it that these undergraduate courses can manage to teach probability theory without making any mention of sigma fields? What do they lose by skipping over this definition? Or rather, what do they have to manipulate in order to avoid it?

At the very least, I can recall from my own time as a first year undergraduate that I would occasionally get confused over what exactly an "event" is and at least in principle, I can see where "the set of all events" part of the definition could be confusing. For example, rolling a six-sided dice would give you a perfectly valid sigma field of { {6}, {not 6}, $\emptyset $, $\Omega$ } which wouldn't seem to fit the earlier definition (e.g. where's the "not 5" event?), unless you cheat the sample space in some way that I would be unsure of the validity of.

Admittedly, there's an awfully cynical part of me that thinks that the answer may be "by hoping that nobody thinks too hard about it" and indeed, inspecting my notes from such courses suggests that may genuinely be the answer. But I'm hoping that the good people at Stack Exchange can show otherwise.

3

There are 3 best solutions below

5
On BEST ANSWER

It is crucial for students of probability to learn that an event is simply a subset of the sample space. This is not trivial. It is deep. What we can describe in words about “what happened” in the probability experiment can be captured precisely by a subset. Arguably, a heavy emphasis on sigma fields early in a probability course can distract attention away from this important concept.

If you have ever taught an introductory probability course, you will know that students struggle with very basic things. Students forget what the equal sign means. They write “equations” where the left-hand-side is not the same type of object as the right hand side, such as having a random variable on the left and a PDF on the right. Consider the following tragic equations of this flavor:
\begin{align} &X \sim U(0,1) \implies \frac{X}{2} = \int_0^1 x dx\\ &P[X] = f(x)dx\\ &P[A] = P[A | B_i]P[B_i] \quad \forall i \end{align}

A detailed discussion of sigma fields, with repeated pointers to its heavy notation and scary terminology, will not help. Arguably, even the most advanced students in the class must first master basic probability calculations before they can appreciate the foundational issues regarding non-measurability that arise (only) when we deal with uncountably infinite sample spaces. [Recall that if the sample space $S$ is finite or countably infinite, we can without loss of rigor define the sigma field to be the set of all subsets of $S$. More restrictive sigma fields are only required for uncountably infinite sample spaces.]*

Rhetorical question: Would students ever learn arithmetic if, first, they were forced to read the foundational work Principia Mathematica by Russell and Whitehead?


*Edit [A footnote to my bracketed sentence above]: Let $S$ be a finite or countably infinite sample space. Define $F$ as the set of all subsets of $S$. Then for any collection of values $(p(\omega))_{\omega \in S}$ that satisfy $p(\omega) \geq 0$ for all $\omega \in S$ and $\sum_{\omega \in S} p(\omega) = 1$, we can define the following function $P:F\rightarrow\mathbb{R}$: $$ \boxed{P[A] = \sum_{\omega \in A} p(\omega) \quad \forall A \subseteq S \quad (\mbox{Equation *})}$$ It is easy to see that this function $P$ satisfies the three axioms of probability and hence it is a valid probability measure:

  1. $P[A] \geq 0$ for all $A \subseteq S$.

  2. $P[S] = 1$.

  3. If $\{A_1, A_2, A_3, ...\}$ is a sequence of disjoint subsets of $S$ then $P[\cup_{i=1}^{\infty} A_i] = \sum_{i=1}^{\infty} P[A_i]$.

Conversely, any function $P:F\rightarrow\mathbb{R}$ that satisfies the above three axioms must have the form of (Equation *) with $p(\omega) = P[\{\omega\}]$ for all $\omega \in S$. The values $p(\omega)$ form the probability masses for each outcome of the finite or countably infinite sample space.

Thus, for finite or countably infinite sample spaces, it makes sense to define the sigma algebra $F$ (which contains all events) as the set of all possible subsets of the sample space.

0
On

It is possible to do probability theory without $\sigma$-fields. There are axiomatization of probability theory other than Kolmogorov's. For example, Bruno de Finetti's axiomatization naturally leads to (subjective) Bayesian statistics, and you don't have countable additivity!

As to what you loses by not introducing $\sigma$-fields in usual undergrad intro to probability course, very little. You could have avoid mentioning it and just say $\mathcal{F}\subseteq\mathcal{P}(\Omega)$ satisfying conditions ... (the usual conditions of being a $\sigma$-field but you don't introduce the term here), and leave the $\sigma$-field to an advanced course on integration and measure theory.

2
On

For me, the main moral about $\sigma$-fields is that "not everything you can possibly think of is an event". That comes as a big surprise when one first encounters such situations. Fortunately, for most elementary theorems in introductory probability courses one can play the game in a rather safe way even if one has a naive idea that every subset of the probability space is measurable.

The countable additivity is, however, rather indispensable and (IMHO) should be emphasized and discussed at length, while the claim that (assuming Choice) it is incompatible with the naive idea above for plenty of natural distributions should be made but doesn't need to be discussed in depth and the curious students may be just referred to other courses and textbooks.

Poor or absent knowledge of Lebesgue integration is another major nuisance, which makes even such a natural split as $E[X]=E[X\cdot 1_{\{X<a\}}]+E[X\cdot 1_{\{X\ge a\}}]$ a technical nightmare. Here you are also often forced to say that the validity of the computations you are making will be justified elsewhere.

In short, you are completely right that it would be nice to put a rigorous foundation to everything but in practice I found that more often than not my main problems when teaching such courses are not at the level of questionable rigor but much, much lower, so the cynical advice to "just hope that nobody would notice anything" has certain merits.

As to alternative axiomatizations of probability, I'm afraid that, when done properly, they will just create a total mess in an average undergraduate student mind and make the students incapable to read standard texts afterwards, though I may be overly pessimistic here. However, the following excerpt from a review of Bruno de Finetti's textbook taken from BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY Volume 83, Number 1, January 1977 makes me wonder whether one just replaces the pre-requisite of elementary measure theory with that of advanced game theory:

After constructing a utility scale, de Finetti introduces probability via expectation, which he calls prevision. The prevision of a random variable X, "according to your opinion, is the value x which You would choose" if "You are commited to accepting any bet whatsoever with gain c(X - x) where c is arbitrary (positive or negative) at the choice of an opponent" (I, p. 87). An event E is regarded as a special case of a random variable, taking the value 0 or 1 depending on whether E is (vérifiably) false or true, and its prevision P(E) is also called its (subjective) probability, de Finetti shows that an equivalent definition of the prevision P{X) is obtained by assuming a squared loss, proportional to (X — x) , and he produces a very ingenious but elementary geometrical argument, based on squared loss, to prove the product law of probabilities (I, p. 137; and 0, p. 15).