Help Understanding Measure Theory definitions of expectations and densities

151 Views Asked by At

This is a bit of a continuation of my last post - I'm studying measure theory and probability and am trying to relate the definitions of expectations, pdf's and cdf's from a typical first year undergraduate probability course to the one's that use measure theory.

We begin with a probability space $(\Omega,\mathcal{F},\mathbb{P})$ and a random variable $X:\Omega \rightarrow \mathbb{R}$ which maps to the space $(\mathbb{R},B(\mathbb{R}))$, where $B(\mathbb{R})$ is the Borel $\sigma$-algebra on the real line.

We define the push forward measure of $X$ into $(\mathbb{R},B(\mathbb{R}))$ (also known as the Law of X or the Distribution of X) $Q_X: B(\mathbb{R}) \rightarrow [0,1]$, by $Q_X(B) = \mathbb{P}(X^{-1}(B)) = \mathbb{P}(X \in B) = \mathbb{P}(\{\omega \in \Omega : X(\omega) \in B\})$, for all $B \in B(\mathbb{R})$.

For some measurable function $g:\mathbb{R} \rightarrow \mathbb{R}$, we define the expectation as $E(g(X)) = \int_{\Omega} g(X(\omega)) d\mathbb{P}(\omega)$.

Letting $x = X(\omega)$, we apply the change of variables formula and obtain $E(X) = \int_{\mathbb{R}} g(x) dQ_{X}(x)$.

Now we assume that $Q_X$ is absolutely continuous with respect to some general measure $\mu$. So $Q_X \ll \mu$. Hence, provided that $Q_X$ and $\mu$ are $\sigma$-finite, by the Radon-Nikodym theorem, there exists some function $f_X = \frac{dQ_X}{d\mu}$, which we call the density of $Q_x$ with respect to $\mu$, such that $Q_X(B) = \int_{B}f_X(\omega)d\mu(\omega)$.

However, recall that $Q_X(B) = \mathbb{P}(X \in B) = \int_B f_X d \mu$. But also, $\mathbb{P}(X \in B) = \mathbb{P}(X^{-1}(B)) = \int_{X^{-1}(B)}d\mathbb{P}$.

Therefore we can write $\mathbb{P}(X \in B) = \int_{X^{-1}(B)}d\mathbb{P} = \int_B f_X d\mu$.

Also note that $F_X(x) = Q_X((-\infty, x]) = \mathbb{P}(X \in (-\infty,x]) = \mathbb{P}(X \leq x)$ is called the Distribution Function (CDF). This is just the pushforward measure applied to the set $(-\infty,x]$ for some $x \in \mathbb{R}$. Using the result above, this can be written as $F_X(x) = \int^x_{-\infty}f_X d\mu$.

Now that we have applied the Radon Nikodym theorem to the pushforward measure, $Q_X$, using some general reference measure $\mu$, we can instead apply it using the Lebesgue measure $\lambda$. We say that a random variable $X$ is continuous iff its pushforward measure is absolutely continuous with respect to the Lebesgue measure. That is, $Q_X \ll \lambda$. Assuming this to be the case, then we get the above results can be rewritten as follows:

$Q_X(B) = \mathbb{P}(X \in B) = \int_B f_X(x) d\lambda(x)$. Or using shorthand notation $\mathbb{P}(X \in B) = \int_B f_X(x) dx$, which is a common result from basic undergraduate first year probability courses.

Similarly, we get $F_X(x) = \mathbb{P}(X \leq x) = \int^x_{-\infty}f_X(x) d\lambda(x)$. Or using shorthand $\mathbb{P}(X \leq x) = \int^x_{-\infty}f_X(x)dx$, which is also a basic result.

Then the expectation of $g(X)$ is now $E(g(X)) = \int_{\mathbb{R}}g(x)f_X(x)dx$ (using shorthand notation for lebesgue measure integral), which is what we are taught in basic probability courses.

My questions are the following:

  1. In basic probability courses, we are taught that the PDF is the derivative of the CDF. That is $f_X(x) = F_X'(x)$. How can we show this result using a measure theoretic approach. Is this related to the fact that the pushforward measure $Q_X$ is entirely characterised by the CDF $F_X$?

  2. Generally in basic probability classes, we are given the "name of a distribution" as well as its PDF. For example the Gaussian Distribution with pdf $f_X(x) = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2}$. So does this mean that we don't actually care about the pushforward measure $Q_X$ itself? Does the word "distribution" in Gaussian Distribution refer to the distribution $Q_X$? So when we refer to a random variable $X$ we usually just refer to its pushforward measure?

  3. What exactly is the relation between the distribution $Q_X$ and the density $f_X$? Can you get one from the other? For example given the density of the Gaussian random variable, can we always find its pushforward measure? Are we even interested in this?

  4. What do you do if your pushforward measure is not absolutely continuous with respect to the Lebesgue measure? Does this mean that the random variable just does not have a density (although it always has a distribution $Q_X$)?

Sorry if these are quite basic or silly questions. Thanks in advance.

2

There are 2 best solutions below

4
On
  1. In basic probability courses we are taught that the PDF is the derivative of the CDF

This is true... if the PDF is (can be chosen to be) continuous. That'd be the fundamental theorem of calculus. However, it's not true in general because the CDF need not be differentiable everywhere. That said, it does need to be differentiable a.e. if the PDF exists. At any rate, it's more accurate to say the PDF is the Radon-Nikodym derivative of the distribution, and the PDF is only well defined up to a.e. equivalence.

  1. So does this mean that we don't actually care about the pushforward measure $Q_X$ itself? Does the word "distribution" in Gaussian Distribution refer to the distribution $Q_X$? So when we refer to a random variable $X$ we usually just refer to its pushforward measure?

The pushforward measure is absolutely what we care about. It is rare for anyone to care about the specific function $X:\Omega\to\Bbb R$, the specific values $X(\omega)$; the distribution a.k.a 'law' is the item of interest, the pushforward measure on $\Bbb R$. Generally theorems in probability do not care about $\Omega$ and if $X':\Omega'\to\Bbb R$ had the same distribution as $X$, we wouldn't mind considering $X'$ instead of $X$.

  1. What exactly is the relation between the distribution $Q_X$ and the density $f_X$? Can you get one from the other? For example given the density of the Gaussian random variable, can we always find its pushforward measure? Are we even interested in this?

We are absolutely interested in the pushforward measure. If the density exists, then by very definition of density we'd have $\mathrm{d}Q_X/\mathrm{d}\lambda\equiv f_X$. $Q_X(A)=\int_Af_X\,\mathrm{d}\lambda$.

  1. What do you do if your pushforward measure is not absolutely continuous with respect to the Lebesgue measure? Does this mean that the random variable just does not have a density (although it always has a distribution $Q_X$)?

That's right. The random variable has a density iff. the distribution is absolutely continuous iff. $\mathbb{P}(X\in A)=0$ whenever $\lambda(A)=0$. For example, any 'discrete' random variable will fail to have a density for this reason.

0
On

To Question 1), any non-decreasing right continuous $F:\mathbb R \to [0,1]$ with $F(-\infty) = 1-F(\infty) = 0$ gives rise to a unique probability measure $\mu_F$ on $(\mathbb R,\mathcal B(\mathbb R))$ with $\mu_F (-\infty,t] = F(t)$. If $\mu_F \ll \mathrm{Leb}$, then $\mu_F$ has a Radon-Nikodym derivative $f$, and thus one has $$F(x)=\mu_F(-\infty,x] = \int_{-\infty}^x f(t)\,dt$$ Moreover, the fundamental theorem of calculus for Lebesgue integrals gives $$F'(x) = f(x)$$ For more details, see for example chapter 3 of Folland's Real Analysis, or chapter 1 of Varadhan's Probability Theory.

To Question 2)

I believe you are asking about whether or not the underlying probability space $(\Omega,\mathcal F,P)$ is always left as abstract. In general, yes, though there are special cases to the contrary. When working with real-valued discrete time Markov processes, it is convenient to have $(\Omega,\mathcal F) = (\mathbb R^{\mathbb N},\mathcal B(\mathbb R^{\mathbb N}))$ so that one can work with shift operators. Here the $\sigma$-algebra is the cylindrical one.

In fact, the pushforward measure can “fail” and one cannot always work solely with the pushforward. In continuous time stochastic processes, one can view it as $X:(\Omega,\mathcal F,P)\to (\mathbb R^{\mathbb R_+},\mathcal B(\mathbb R^{\mathbb R_+} ))$. The issue here is that the $\sigma$ algebra can only give information about the process at a countable set of times. Hence sets such as continuous functions are NOT measurable with respect to the cylindrical $\sigma$ algebra. However, this does not imply that the event $X$ is continuous is not in $\mathcal F$. Thus the pushforward here is actually very limited and does not even characterize the stochastic process! Indeed, having pushforwards agreeing can only tell you that the processes agree at a countable set of times.

Admittedly, this is a somewhat contrived example, but I wanted to make the point that there is a bit more subtlety than you only ever need the pushforward.

To Question 4), I'm not sure what you have in mind here. It is not necessary, and often not even convenient, to work with densities. Trivially, any measure will be absolutely continuous with respect to itself, though you can often find a different reference measure for a density. As an example, consider a random variable which is a dirac with probability $p$ and a Gaussian with probability $(1-p)$. Then its pushforward will be absolutely continuous with respect to $\delta+\mathrm{Leb}$. This is useful when working with the factorization theorem for sufficient statistics. In elementary contexts, one can only state the factorization theorem for continuous or discrete random variables. It in facts hold for any random variable whose pushforward is dominated by a $\sigma$-finite measure. To clarify, it need not be $\sigma$-finite on $\mathbb R$. It could be $\sigma$-finite, on say $\mathbb Q$, which is the case for counting measure. For more, see chapter 2 of Shao's Mathematical Statistics.