Differences between $X$, $\mathbb{E}(X)$ and $\mathbb{E}(X\mid\mathcal{F})$?

80 Views Asked by At

Intuitively, what are the differences between $X$, $\mathbb{E}(X)$ and $\mathbb{E}(X\mid\mathcal{F})$

The measure-theoretic approach to probability has been rather confusing. Prior to learning about measure theory in probability, I always learnt that

i) $X$ is a random variable, and ii) $\mathbb{E}(X)$ is a constant by "averaging" over all events in the sample space. Then, in the measure theory course, I learnt that i) $X$ is a measurable function, i.e. the pre-image $X^{-1}(A)=\{\omega\in\Omega:X(\omega)\in A\}$; ii) $X$ is $\mathcal{F}$-measurable, then $\mathbb{E}(X\mid\mathcal{F})=X$. I also interpret $\mathcal{F}$ as the “information we currently have” (I know mathematically it is the $\sigma$-algebra generated by $\Omega$.

Therefore, to ensure that I have the right intuition for the three quantities $X$, $\mathbb{E}(X)$ and $\mathbb{E}(X\mid\mathcal{F})$, i) which of these are random variables and which are constants; and ii) how do I interpret the differences between each one (particularly between $\mathbb{E}(X)$ and $\mathbb{E}(X\mid\mathcal{F})$)?

2

There are 2 best solutions below

0
On BEST ANSWER

Those definitions and what they mean intuitively can indeed be confusing when you are new to subject.
The first point to make here is that one should view all those expectations as functions on the underlying sample space $\Omega$. There is a hidden argument $\omega \in \Omega$ in all of them, so we really have $$ X=X(\omega), \quad E[X] = E[X](\omega), \quad E[X|\mathcal{F}] = E[X|\mathcal{F}](\omega). $$ All of those should be seen as the best possible approximation of the function $X(\omega)$ given the information currently at hand. To make this precise we need the concept of filtration of $\sigma-$algebras.
The stochastic variable $X(\omega)$ itself is really nothing more than a measurable function on some $\sigma-$algebra $\mathcal{F}$. It maps points $\omega$ in the sample space to values.

The collection of sets in $\mathcal{F}$ should be seen as representing all those events, collection of sample points, you can separate given the possible values of $X(\omega)$.
To make this clear, let us make a very simple example:
Let $X(\omega)$ be defined on the discrete set $\Omega = \{ 1,2,3,4\}$ with values in $\mathbb{R}$, where $X(1) = 10, X(2) = 10$, $X(3)= 20, X(4) = 30$. We can see this $X$ as random draws from $\Omega$, but there is no need to introduce any probability measure at this point.

The usual Borel $\sigma$-algebra on $\mathbb{R}$ now induces a $\sigma$-algebra on $\Omega$. The reverse image of $X$ of sets in $\mathbb{R}$ can be precisely one of the sets in $$ \mathcal{F} = \{\emptyset, \,\Omega, \{1, 2 \}, \{ 3\}, \{ 4\}, \{ 3, 4\}, \{1,2,3 \}, \{1, 2,4 \}\} $$ If you only are allowed to observe the possible values of $X(\omega)$, but not $\omega$ itself you will never (regardless which outcome you observe) be able to answer the question: Did the random draw result in a $1$ or $2$?
If the value observed is $20$, you know $\omega$ was neither $1$ or $2$. If the value observed is $10$ you know $\omega$ is either $1$ or $2$.
When one says that the sigma algebra $\mathcal{F}$ does not contain the information to separate the sets $\{ 1\}$ and $\{2\}$, this is what it means. Those sets are not in the $\sigma$-algebra and hence cannot be distinguished by looking at the possible function values of $X$ on them since X has the same value for every $\omega$ in the set: $X(1) = X(2)$.

We can see $X(\omega)$ as outcome of an experiment resulting in the sample $\omega$. Everything about it is known. We know the function $X$ and we know the sample and can calculate $X(\omega)$.
Here it is helpful to introduce a time dimension: Let us assume the situation where we fully know $X$ and the full outcome of the experiment $\omega$ occurs at a time $T>0$. The $\sigma$-algebra considered then is $\mathcal{F}_T := \mathcal{F}$. Let us also assume we have a probablility measure $\mathbb{P}$ describing the probability of the different outcomes for $X$.
We now have a full probability space $(\Omega, \mathcal{F}_T, \mathbb{P})$. Here, by definition we have $$ X(\omega) = E[X|\mathcal{F}_T](\omega). $$ $X$ is fully known given the information in the $\sigma$-algebra $\mathcal{F}$.

Assume now that we are at a earlier point in time when the full outcome of the experiment is not known. For example, let us start at time $0$. At that point we assume nothing about the experiment is know. We know the full experiment can result in the values $10,20$ or $30$, but so far we have no information about the outcome except what the possible values are and their probabilities.

At this point in time we cannot differentiate between any of the subsets of $\mathcal{F}$. Given a set of $\mathcal{F}$, we cannot answer the question if the sample $\omega$ lies in the set or not, since nothing about $\omega$ is known at this point.
We now want to make the best possible approximation of the function $X(\omega)$ at time $0$. Since nothing is known, the best we can do is approximate it with the constant function which equals its "usual" expected value $$ E[X](\omega) = \sum_i X(\omega_i) \cdot \mathbb{P}(\omega_i) $$ So the function does not depend on $\omega$ and is therefore constant. We have taken the probability average over all possible outcomes. We cannot differentiate between any outcomes so far. That means that the natural (i.e the largest one where it is measurable) $\sigma$-algebra for the expected value function here is $\mathcal{F}_0 = \{ \emptyset, \Omega\}$.
By the way, regarding notation, we have by definition that $$ E[X] = E[X](\omega) = E[X| \mathcal{F}_0](\omega) = \textrm{ A constant} $$

I want to make a remark here about something that can be a bit confusing. At that point in time $t=0$, it would be natural to see the outcome $\omega$ as the thing that is unknown and not the function $X$. However, this is represented a bit as the other way around mathematically: In the calculations we view the $\omega$ as fully known entity, but the expected value function at the time can only "see" part of the information represented by $\omega$ (which is nothing at time $0$).

Now assume we are at some point $t$ in time between $0$ and $T$ where we know a little bit about the possible outcomes of the experiment, but not all there is to know. Let us say, that at that time we know if the experiment had an outcome strictly smaller than $30$ or not.
That means we know if $\omega \in \{1,2,3 \}$ or if $\omega \in \{ 4\}$. For any possible $\omega$, we can, for those two sets say for sure if it belongs to the set or not.
This is not the case for sets like $\{ 2\}$ or $\{ 3, 4\}$. If we know the outcome was smaller than $30$, $\omega$ could be in the set $\{ 2\}$, but we can't know. The same for the other set, if the outcome was smaller than $30$, $\omega$ could belong to $\{ 3, 4\}$, but we don't know.
The $\sigma$-algebra of sets we can separate from the information given at that time is $$ \mathcal{F}_t = \{ \emptyset, \Omega, \{1,2,3\}, \{ 4\}\} $$ We clearly have $$ \mathcal{F}_0 \subset \mathcal{F}_t \subset \mathcal{F}_T $$ Now we want to calculate the expected value of $X$ at time $t$ given the information described. This is still a function on $\Omega$ but only measurable on the $\sigma$-algebra, $\mathcal{F}_t$. We cannot separate the points in the set of $\mathcal{F}$ based on our current information. Therefore the expected value of $X$ given the information must be constant on these sets and the best approximation we can do on those sets is to take the expected value given that we know that the sample $\omega$ belongs to the set:
$$ \begin{eqnarray*} E[X | \mathcal{F}_t](\omega) &=& \frac{1}{\mathbb{P}(\{ 1,2,3\})} \left( X(1) \cdot \mathbb{P}(\{ 1\}) + X(2) \cdot \mathbb{P}(\{ 2\})+ X(3) \cdot \mathbb{P}(\{ 3\})\right) \textrm{ if } \omega \in \{ 1,2,3\}\\ E[X | \mathcal{F}_t](\omega) &=& \frac{1}{\mathbb{P}(\{ 4\})} ( X(4) \cdot \mathbb{P}(\{ 4\}) = X(4) \textrm{ if } \omega \in \{ 4\} \end{eqnarray*} $$ So $E[X | \mathcal{F}_t](\omega)$ is here a function taking two different values. This is our best approximation of $X$ given the information described.
In general we have a whole so called filtration of sigma algebras $$ \mathcal{F}_0 \subset \mathcal{F}_s \subset \mathcal{F}_T, \quad 0 < s < T, $$ where the $\sigma$-algebras are indexed by a variable $s$ one can se as time where the information increases with time.
So all the functions $E[X | \mathcal{F}_t](\omega)$ can be seen as the best possible approximation of $X$ given the information at a certain time, starting with zero information and a constant function to full information and the full $X(\omega) = E[X | \mathcal{F}_T](\omega)$.

In this example I have used a discrete variable to illustrate the principle. In the continuous case sums will be replaced by integrals and the expected values will be averaged integral values instead.

0
On
  1. $\Omega$, the probability space, is the set of all conceivable outcomes of the random experiment, with a maximum level of detail. For instance, if you're throwing a coin, the elements of $\Omega$ could be trajectories, each $\omega\in\Omega$ containing all the information about the position of the coin as a function of time, how it rotated over time, where it hit the ground, how it bounced, etc.
  2. $X$, a measurable function on $\Omega$, represents you throwing out some of that detail and picking a specific property of the random outcome to measure. Thus if $\omega$ is a trajectory, $X(\omega)$ could be the side of the coin that's face up at the end of that trajectory. We're used to thinking of functions as things that transform, but think of $X$ as something that masks, it just gets rid of detail, like a function that given a $10$ dimensional vector just gives you the first and second coordinates.
  3. If $X$ is a number or a vector, then we can talk about its average value $E(X)$.
  4. A sigma-algebra $\mathcal T$ is a set of yes-no questions which we are capable of knowing the answers to. That is, when the outcome of the experiment is $\omega\in\Omega$, you actually don't get to know exactly which $\omega$ it is. However, given any $A\in\mathcal T$, you can determine whether or not $\omega\in\mathcal T$. Which sigma algebra you have "access" to depends on the situation. If someone else flips the coin, so you don't get to see its full trajectory, and they just tell you $X$, the side that came up, then you only have access to the four-element sigma algebra generated by $X$. You don't know which $\omega$ happened, just what the value of $X(\omega)$ is.
  5. For a function $\mathcal (A,\mathcal T)\to(B,\mathcal S)$ to be measurable means that if you know the answer to every $\mathcal T$-question about $a\in A$, then you know the answer to every $\mathcal S$-question about $f(a)$.
  6. Now suppose a random experiment is performed, so we get some $\omega\in\Omega$. I don't get to see $\omega$, the only thing I know about it is the sigma algebra $\mathcal T$. Now say there's some measurable function/random variable $X$ on $\Omega$ which takes on numeric values. Given that I only know $\mathcal T$, what can I say about the value of $X$? Well, what I can do is, given all the knowledge available to me, take a guess at the average of $X$. For a different $\omega$ there will be different knowledge available to me, so my guess will be different. Therefore we can say that this guess is a function of $\omega$, and basically by definition it will be measurable with respect to $\mathcal T$, since its value is determined solely on the basis of $\mathcal T$-knowledge. We write $E(X|\mathcal T)(\omega)$ for this function.
  7. Notice that it's a little counterintuitive to write this guess as a function of $\omega$, since we don't know $\omega$ itself when we make the guess. We only know the answers to $\mathcal T$-questions about $\omega$, so writing our guess in the form $f(\omega)$ might feel kind of wrong. However, since our guess does depend only on $\omega$, this is still technically correct. Imagine that $h(\omega)$ represents our knowledge about $\omega$, the vector of answers to every $\mathcal T$-question about $\omega$. Then the value we actually get to know is $h(\omega)$, and we use some function $g$ to calculate $g(h(\omega))$, our guess for $X$ on the basis of that information. Then $E(X|\mathcal T)$ is a notation for $g\circ h$, and not $g$, while $g$ is the actual function that we know how to calculate.