Why do we consider Borel sets instead of (Lebesgue) measurable sets?

2.6k Views Asked by At

Dumb/Challenging conventional wisdom question possibly related to my previous question.

Why do we sometimes consider a measure space $(S, \Sigma, \mu) = (\mathbb{R}, \mathscr{B}(\mathbb{R}), \lambda)$ where $\lambda$ is Lebesgue measure rather than $(S, \Sigma, \mu) = (\mathbb{R}, \mathscr{M}(\mathbb{R}), \lambda)$ where $\mathscr{M}(\mathbb{R})$ is the set of $\lambda$-measurable subsets of $\mathbb{R}$? I mean, there are subsets of $\mathbb{R}$ that are not Borel sets but $\lambda$-measurable right? If there are none, I guess that answers the first question.

Possibly answered by above but why, in my previous question, is it 'natural' to consider $\mathscr{F}$? I'm guessing it's like why it's 'natural' to consider $\mathscr{B}(\mathbb{R})$.

Possibly related:

Why do probabilists take random variables to be Borel (and not Lebesgue) measurable?

4

There are 4 best solutions below

8
On BEST ANSWER

Well i'd say it depends of the context but one reason that come to my mind is that the borel $\sigma$-algebra is simpler (and smaller) than the Lebesgue $\sigma$-algebra $\mathscr{M}(\mathbf{R})$. For a lot of things the seting of borel functions or borel sigma alegbra is enough for what you want to do, using the Lebesgue sigma algebra would only make the proofs harder or even invalidate the results you want to prove.

An example about the "harder proofs" parts : the $\sigma$-algebra $\mathscr{B}(\mathbf{R})$ is generated by the open sets of $\mathbf R$, and a lot of proofs use this fact. Unfortunatly the situation is more complex with $\mathscr{M}(\mathbf R)$.

An example about the "invalidating results" part : It's easy to show that if $f$ and $g$ are Borel then $f\circ g $ is also Borel. However, if you define a measurable function to be a function $f$ such that for every open set $U\subset \mathbf R$ you have $f^{-1}(U)\in \mathscr M (\mathbf R)$ then the composition of two measurable functions is not measurable in general.

Side note : the fact that the composition of a two measurable functions is not measurable is closely related to the fact that some functions are Borel but not Lebesgue (where $f$ is Lebesgue mean $f^{-1}(U)\in \mathscr M (\mathbf R)$ for every $U\in \mathscr M (\mathbf R))$. There is a exercice in Folland's Real analysis about that if i remember it right. But $\mathscr M (\mathbf R)$ is absolutely crucial in integration theory, indeed there are functions that are Riemann integrable but not Borel (think of the characteristic functions of some subset of the triadic cantor set).

To finish, yes $\mathscr M (\mathbf R)\backslash\mathscr B (\mathbf R)$ is nonempty. But you have the following result :

if $A\in \mathscr M (\mathbf R)\backslash\mathscr B (\mathbf R)$ then there exists two borel sets $M$ and $N$ such that $M\subset A$, $A\subset M \cup N$ and $\lambda(N)=0$ (so $A$ is a borel set up to some non Borel negligible set). Moreover one have $\lambda(A)=\lambda(M)$.

2
On

I'm not sure what you have in mind when you say example, but if you look in a basic (undergraduate level) probability book you'll see they really struggle with the fact that you can't give a probability to any arbitrary event. The question then becomes what subset of $2^{\mathbb R}$ you want to consider. There's some desire to be as broad as possible, but the basic machinery that you need to develop is quite difficult and perhaps too difficult for the typical student of classical probability who will probably never encounter in the real world an event that is not a Borel set. In find in teaching probability that even Borel sets are too complicated for the typical student who is a scientist who just wants to test the significance of their data. For such people they will probably never encounter an event that's not an interval, or at most a union of two or three intervals. But such students would be lost in the difficult details of analysis necessary to include more sets than they would actually ever need. That's why some authors, e.g. Larson, side step the entire issue of non-integrable sets completely, and just warn the student that not all subsets of $\mathbb R$ can be events and then just move on.

0
On

Probability perspective:

PART I

It is more natural to use Borel sets than Lebesgue measurable sets because

  1. From elementary probability, the cumulative distribution function is $$F_X(x) = P(X \le x) = P(\{X \le x\}) = P(X \in (-\infty, x]).$$ That is, we often investigate probabilities (measures) of random variables (measurable functions) in sets of the form $(-\infty, x]$

  2. Now if you collect those sets, we have a $\pi$-system:

$$\pi(\mathbb R) := \{(-\infty, x] \ | \ x \in \mathbb R \}$$

Then finally

$$\sigma(\pi(\mathbb R)) = \mathscr B (\mathbb R)$$

Thus, the sets in $\mathscr M (\mathbb R) \setminus \mathscr B (\mathbb R)$ don't really need to be considered.

PART II

Also in line with this, thus probably explaining why this is a dumb question, Borel sets are more on the output than input of random variables. Measurable sets are the input. Thus, Lebesgue measure or whatever measure ($\mathbb P$) would be used, would be more for computing probabilities that certain random variables are in Borel sets rather than the computing length of the Borel sets or anything.

1
On

The first observation to make here is there's a difference between a measurable space (that is, a set paired with a $\sigma$-algebra) and a measure space (that is, a measurable space equipped with a measure). I challenge you to find examples where the measure space $(\mathbb{R},\mathscr{B}(\mathbb{R}),\lambda)$ is being studied in a non-trivial manner without implicit reference to $\mathscr{M}(\mathbb{R})$. On the other hand, the measurable space $(\mathbb{R},\mathscr{B}(\mathbb{R}))$ is a natural place to do measure theory.

The issue with $(\mathbb{R},\mathscr{B}(\mathbb{R}),\lambda)$ is it is not a complete measure space. That is, null sets of $\lambda$ need not be measurable. In a sense, once you know what measure you want to put on your measurable space, you should complete it immediately (which you're free to do). Why? Many of the theorems of integration are more naturally stated if you assume completeness, and you're less likely to run into annoying paradoxes. For example, if you give me a set $B \in \mathscr{M}(\mathbb{R}) \setminus \mathscr{B}(\mathbb{R})$, then $\chi_{B}$ is Lebesgue but not Borel measurable. Nonetheless, I can find a sequence $(\varphi_{n})_{n \in \mathbb{N}}$ of smooth functions such that $\varphi_{n}(x) \to \chi_{B}(x)$ for almost every $x \in \mathbb{R}$.

The way I think of this is, once you have a measure in mind, you should complete the space. I haven't come across a counter-example to this rule of thumb yet.

While I don't think $(\mathbb{R},\mathscr{B}(\mathbb{R}),\lambda)$ is a very interesting measure space, $(\mathbb{R},\mathscr{B}(\mathbb{R}))$ is an important measurable space. This is easiest to see in the context of probability.

Suppose we have a probability space $(\Omega,\mathcal{F},\mathbb{P})$ and we're interested in a function $X : \Omega \to \mathbb{R}$, as one often is. Probably we're not interested in knowing $X$ pointwise, that is, in understanding the function $\omega \mapsto X(\omega)$, since somewhat implicit in the set-up of a probability space is we don't know which outcomes $\omega$ in $\Omega$ we're dealing with. The set-up of measure-theoretic probability is instead that we would like to know $\mathbb{P}\{X \in A\}$ for a rich enough collection of subsets $A$ of $\mathbb{R}$. This is only well-defined if the events $\{X \in A\}$ are in $\mathcal{F}$ to begin with. In other words, we need to understand a little bit about the push forward $\sigma$-algebra $\mathcal{F}_{X} = \{A \subseteq \mathbb{R} \, \mid \, X^{-1}(A) \in \mathcal{F}\}$.

What sets should $\mathcal{F}_{X}$ contain in general? This is up to how you want to define probability theory. The standard set-up is $X$ is a random variable if the push forward $\sigma$-algebra contains $\mathscr{B}(\mathbb{R})$. Why? Well, one way to think about random variables it to ask that, at the very least, we should be able to compute $\mathbb{P}\{X \leq c\}$ for arbitrary real numbers $c$. More rigorously, we should demand that $\{X \leq c\} \in \mathcal{F}$ (or $(-\infty,c] \in \mathcal{F}_{X}$) independently of the choice of $c$.

However, the collection $\{(\infty,c] \, \mid \, c \in \mathbb{R}\}$ generates $\mathscr{B}(\mathbb{R})$ so if the $\mathcal{F}_{X} \supseteq \{(-\infty,c] \, \mid \, c \in \mathbb{R}\}$, then it contains $\mathscr{B}(\mathbb{R})$. In fact, the following result holds:

The following are equivalent:

1) $\mathscr{B}(\mathbb{R})\subseteq \mathcal{F}_{X}$

2) $\forall c \in \mathbb{R} \quad \{X \leq c\} \in \mathcal{F}$

3) $\forall c \in \mathbb{R} \quad \{X < c\} \in \mathcal{F}$

4) $\forall c \in \mathbb{R} \quad \{X \geq c\} \in \mathcal{F}$

5) $\forall c \in \mathbb{R} \quad \{X > c\} \in \mathcal{F}$

6) If $U \subseteq \mathbb{R}$ is open, then $\{X \in U\} \in \mathcal{F}$

7) If $C \subseteq \mathbb{R}$ is closed, then $\{X \in C\} \in \mathcal{F}$

The point is if you want to be able to ask questions about random variables involving the topology (or ordering) of $\mathbb{R}$, then you need the random variable to be measurable from $(\Omega,\mathcal{F},\mathbb{P})$ into $(\mathbb{R},\mathscr{B}(\mathbb{R}))$ at a minimum. This is why the modern theory of probability defines (real-valued) random variables in terms of $(\mathbb{R},\mathscr{B}(\mathbb{R}))$.