Help understanding the definition of a "filtration" in probability theory

Question

Help understanding the definition of a "filtration" in probability theory

1.9k Views Asked by Bumbble Comm At 10 May 2026 - 7:38

I am having trouble understanding wikipedia's definition of filtration in probability theory:

Definition "filtration"

Let $(\Omega ,\mathcal {A}, P)$ be a probability space
Let $I$ be a totally ordered index set
Then $\mathbb{F} = \bigl(\mathcal{F}_i\bigr)_{i\in I}$ is a filtration if every $\mathcal{F}_i$ is a sub-$\sigma$-algebra of $\mathcal{A}$ and for all $m,n \in I \times I$ we have $\mathcal{F}_m \subseteq \mathcal{F}_n$ whenever $m \le n$

$~ \square$

This definition feels unintuitive and I would have thought the direction of containment would go in the other direction: $\mathcal{A} = \mathcal{F}_0 \supseteq \mathcal{F}_m \supseteq \mathcal{F}_n$.

After all, the set of possible outcomes becomes smaller as one observes longer prefixes and more of the process that's unfolding.

For concreteness, can someone provide an example of what the structure of the sets $\mathcal{F}_n$ look like? Say for a sequence of Bernoulli RVs, $X_1, X_2, X_3, \ldots$, what do $\mathcal{F}_0$, $\mathcal{F}_2$, $\mathcal{F}_2,$ contain? And what does $\mathcal{A}$ contain?

Original Q&A

There are 3 best solutions below

**Bumbble Comm** · Answer 1 · 2019-12-28 19:02:42

${\cal F}_n$ makes all of the distinctions that ${\cal F}_m$ makes, for $m\leq n$, but may also make finer-grained distinctions. That is, ${\cal F}_n$ may contain sets that are subsets of the smallest sets in ${\cal F}_m$, along with all of the sets in ${\cal F}_m$. That's why ${\cal F}_m \subseteq {\cal F}_n$: the number of possible outcomes--subsets--becomes larger, but (some of) the smallest of them are smaller (or may be, since it's a $\subseteq$ relationship).

For example, suppose that we toss a coin three times, recording 1 for heads and 0 for tails, so the atomic outcomes are the eight triples of 1's and 0's. I'll describe a filtration on this space induced by the three tosses, with $*$ to represent that any possible outcome is allowed for some elements in a triple. That is, I'll use $\langle 0,1,* \rangle$, for example, to refer to the set of all sequences of toss outcomes in which the first toss comes up tails, the second one comes up heads, and the third toss has either outcome. (Note that I am abusing notation a bit to represent a set, not a sequence per se. I'm not distinguishing between what are called atoms and the singleton sets containing them.)

To create each stage of the filtration, I will start with some of these atomic outcomes and create an algebra that contains those atomic outcomes plus all possible unions of them, along with the empty set. (That is, we take the set of atomic outcomes and then take its closure under unions.)

For the first toss, the atomic outcomes are

$$A_0=\langle 0, \ast , * \rangle, \; A_1=\langle 1, * , * \rangle \;,$$

and the full algebra of outcomes is

$${\cal F}_1 = \{ A_0, A_1, A_0 \cup A_1, \varnothing \} = \{ \langle 0, \ast , \ast \rangle, \langle 1, \ast , \ast \rangle, \langle 0, \ast , \ast \rangle \cup \langle 1, \ast , \ast \rangle, \varnothing \}\;. $$

For the first and second tosses, the atomic outcomes are

$$A_{00}=\langle 0,0, * \rangle, \; A_{01}=\langle 0,1, * \rangle, \; A_{10}=\langle 1,0, * \rangle, \; A_{11}=\langle 1,1, * \rangle \;.$$

The second element in the filtration, ${\cal F}_2$, contains these four outcomes plus all of their possible unions and the empty set. This means that ${\cal F}_2$ contains sixteen sets, and I won't list them all. Note that the notation I used for ${\cal F}_1$ just summarizes some of the notation I'm using for ${\cal F}_2$: $A_0=A_{00}\cup A_{01}$ and $A_1=A_{10}\cup A_{11}$. So all of the outcomes in ${\cal F}_1$ are included in ${\cal F}_2$, and therefore ${\cal F}_1 \subseteq {\cal F}_2$.

The atomic outcomes for all three tosses will correspond to the $2^3=8$ possible three-element sequences of 0 and 1, which we can represent as

$$A_{000}=\langle 0,0,0\rangle, \; A_{001}=\langle 0,0,1\rangle, \; A_{010}=\langle 0,1,0\rangle, \;\ldots,\; A_{111}=\langle 1,1,1\rangle \;.$$

(I'm still abusing notation: those are sets.)

The third algebra in the filtration, ${\cal F}_3$, consists of these eight sets plus all possible unions of them, along with the empty set. There are 64 such sets in ${\cal F}_3$. Note that again some of these unions are equal to the atomic sets defined for ${\cal F}_2$ : $A_{00} = A_{000}\cup A_{001}$, and so on. So ${\cal F}_3$ includes all of the outcomes in ${\cal F}_2$, and ${\cal F}_2 \subseteq {\cal F}_3$.

(By the way, the trick of starting with some smallest, atomic sets and then creating the algebras by taking unions is one that works when each of the original random variables can take a finite or countably infinite set of values. When the random variables have continuous, uncountably infinite, sets of values, it's usually necessary to start with specific larger sets and then form algebras using both union and intersection. That's a topic for other questions, which have no doubt been asked and answered.)

**Bumbble Comm** · Answer 2 · 2019-12-28 19:02:56

Sigma algebras are often thought of as containing "information". Conditioning on a larger sigma algebra corresponds to "knowing more" about the values of random variables (more things are measurable with respect to a larger sigma algebra). Often with filtrations, we are thinking about adding random variables to the sigma algebras over time. For instance, if $X_1,X_2,\dots$ is a random walk, then we might have $\mathcal{F}_n = \sigma(X_1,\dots,X_n)$. Then it follows that $\mathcal{F}_m \subseteq \mathcal{F}_n$ whenever $m \leq n$. At time $n$, we "know more" about what the random walk has done than we did at time $m\leq n$.

**Bumbble Comm** · Answer 3 · 2019-12-28 20:49:11

Let $(X_i)$ be some stochastic process.

You say that, as we observe more and more of the $X_i$, the set of possible outcomes becomes smaller. What you mean by this is the following. Think of the values of the sequence $X_i$ as having already been determined. We want to know exactly what the entire sequence is, but we only get to observe the first $n$ values. This narrows down the set of possible outcomes, that is, the set of possible sequences. If we know that the first $n$ values are $x_1, ..., x_n$, then the set of possible outcomes is reduced to the set of all sequences in

$$A_n=\{(X_i) \in \mathbb R^\mathbb N \mid X_1=x_1, ... X_n=x_n\}$$

It is certainly true that the $A_i$ form a decreasing sequence of sets. In that sense, the set of possible outcomes is decreasing. However, the sequence $(A_i)$ is not what we call a filtration. In fact, the $A_n$ are not even sigma algebras, or even sets of events, they're sets of values of the stochastic process.

If we make an observation of any kind, the sigma algebra generally represents the collection of sets such that we can determine definitively whether or not they happened. For example, let the probability space be the set of all people in a country, and let $H$ be the function mapping a person to their height. Say the sigma algebra on the probability is discrete, with the uniform distribution (so we're picking people at random from the population, and measuring their height). In this situation, the set $A$ of "people with a height less than $6$ foot" and the set $B$ of "people with blond hair" are both events in the sigma algebra. However, only $A$ is in the sigma algebra generated by $H$. This corresponds to the fact that if I give you $H(p)$, you can tell me definitively whether or not $p$ is in $A$, but you can't tell me whether or not $p$ is in $B$.

Therefore, an increasing sequence of sigma algebras represents the gaining of information. I know more, therefore there are more events which I can tell happened. It is a general fact that if $(X_n)$ is a sequence of random variables, and we set $F_n$ to be the sigma algebra generated by $(X_1, ..., X_n)$, then the $F_i$ are increasing. The more variables we observe, the more events we can tell happened or didn't happen. If I observe only $H$, I can't determine if a person has blonde hair, but if I observe $(H, C)$ (height and hair color), then I can still detect all the same events as when I only observed $H$, but I can also tell if a person's hair is blonde.

For Bernoulli random variables, the sigma algebra generated by $(X_1, ..., X_n)$ is the collection of all sets which we can tell happened (or didn't happen) based only on the values of $X_1$ through $X_n$. Specifically, it is the set of all sets $A$ in $\{0, 1\}^{\mathbb N}$ such that if $(x_n)$ is in $A$, and $(y_n)$ is a sequence satisfying $x_i=y_i$ for $i=1,...,n$, then $(y_n)$ is in $A$ as well.

Help understanding the definition of a "filtration" in probability theory

There are 3 best solutions below

Related Questions in PROBABILITY-THEORY

Related Questions in MARTINGALES

Related Questions in TIME-SERIES

Related Questions in FILTRATIONS

Trending Questions

Popular # Hahtags

Popular Questions