Why is a $\sigma$-algebra the available information?

218 Views Asked by At

In probability, if we have a set $\Omega $, we say that a $\sigma$-algebra is the information available, but I don't understand what it really means. Could someone give intuition on this fact ? (maybe my question is missing context, so feel free to add more context if necessary).

4

There are 4 best solutions below

0
On

Given a set $\Omega$ and a probability measure $\cal P$ we can find a $\sigma$-algebra of sets that are $\cal P$-measurable: that is, a collection of sets that $\cal P$ assigns a numerical value between $0$ and $1$ (inclusive) to. The $\sigma$-algebra is the 'information available' in the sense that any event that can have a probability assigned to it lies in the $\sigma$-algebra.

Take, for example, tossing a coin twice. $\Omega = \{HH, HT, TH, TT\}$ and our $\sigma$-algebra can assign probabilities of $1/4$ to each outcome. We can derive answers to questions like "what's the probability that we get a head first when we toss a coin twice?" because the $\sigma$-algebra lets us calculate the outcome of taking unions, intersections and complements. However, we have no information about "what's the probability of getting heads when tossing the coin once?" because we cannot use $\sigma$-algebra operations to obtain ${\cal P}(H)$.

1
On

I don't think there is an intuitive, non-technical answer to this. In the case of a finite probability space, you might say we have a finite set of "atomic events" and all other events are combinations of these. For example, if we are looking at three tosses of a coin, then the atomic events I'm thinking of are things like, "The second toss comes up tails." Now we can assign a probability to each possible subset of the atomic events, in such a way that the familiar rules of probability are satisfied.

When we try to extend these notions to infinite sets, however, problems arise. It turns out to be impossible to define a notion of probability so that every subset is and the familiar rules of probability are satisfied. As an example, consider the probability of picking a real number between $0$ and $1$ at random. If each number is equal likely to be chosen, what is this probability to be? There is no way to assign a probability and to retain the rule that the probability of the union of disjoint events is the sum of the probabilities. So, we have give up something, and it turns out that the useful thing to do is to restrict the rule to a countable number of disjoint events.

It also turns out, though this is much more difficult to show, that it is not possible to consistently assign a probability to every subset of the real numbers in the above example. This question is intimately related to the definition of the definite integral, and was worked out over a period of about $100$ years, as described in "A Radical Approach to Real Analysis," for example. So we have to restrict the vents to something useful and manageable. $\sigma-$algebras turn out to be just what the doctor ordered.

2
On

The simplistic view is that the $\sigma$-algebra contains those events that we know something about, by virtue of them being the events that we can assign a probability to.

But this analogy really makes more sense when you start talking about multiple $\sigma$-algebras at once (so that we can speak about having more or less information within the same overall context). In the usual situation, these $\sigma$-algebras form a filtration. A "discrete filtration" is a sequence of $\sigma$-algebras $\mathcal{F}_n$ with $\mathcal{F}_n \subset \mathcal{F}_{n+1}$. A "continuous filtration" is a continuum of $\sigma$-algebras $\mathcal{F}_t,t \in [0,T]$, with $\mathcal{F}_t \subset \mathcal{F}_s$ when $t \leq s$. (Neither of these terms are standard, which is why I use the scare quotes.)

A common way to construct a filtration is by $\mathcal{F}_t=\sigma(X_s : s \leq t)$, i.e. the smallest $\sigma$-algebra such that $X_s$ is measurable for all $s \leq t$. (Or you can do the analogous thing in discrete time.) In this sense $\mathcal{F}_t$ is the information that we could obtain by watching the process run only up to time $t$. Specifically we can know whether any event $A \in \mathcal{F}_t$ occurred (i.e. whether $\omega \in A$) by observing the process up to time $t$. Moreover, for a random variable $Y$ not measurable wrt $\mathcal{F}_t$, the conditional expectation $\mathbb{E}[Y \mid \mathcal{F}_t]$ is our "best guess" for $Y$ given the information we saw up to time $t$, in the sense that our guess is the $\mathcal{F}_t$-measurable random variable closest to $Y$ in $L^2$.

If you have heard this analogy but have not yet started studying stochastic processes, I would just ignore it for now, because it is not really very useful until you get to stochastic processes.

0
On

(Have you ever played the game "20 Questions"?)

Here is one take on the information role played by the $\sigma$-algebra $\mathcal F$ in a probability space $(\Omega,\mathcal F,\Bbb P)$. Suppose that Tyche the goddess of chance picks a point $\omega$ at random from $\Omega$. (So that the probability that $\omega$ is an element of $B$ is $\Bbb P(B)$, for each $B\in\mathcal F$.) Tyche does not reveal her choice to us, but she is willing to answer questions as to the nature of the chosen point. Thus for each $B\in\mathcal F$, she will answer truthfully to the Yes or No question "Is $\omega$ an element of $B$?" These answers, in aggregate, reveal the set $$ A_{\mathcal F}(\omega):=\{\omega'\in\Omega: 1_B(\omega')=1_B(\omega),\forall B\in\mathcal F\}, $$ which embodies the "information content" of $\mathcal F$ concerning Tyche's choice $\omega$. Of course, if $\mathcal G$ is another $\sigma$-algebra, then $\mathcal F\subset\mathcal G$ implies that $A_{\mathcal F}(\omega)\supset A_{\mathcal G}(\omega)$. The $\sigma$-algebra $\mathcal F$ is said to "separate points" provided $A_{\mathcal F}(\omega)=\{\omega\}$ for all $\omega\in\Omega$, the situation of perfect information.