I want to understand the connection between probability density function and generalized function.
- There are 3 common used generalized function classes: $\mathscr{E}'\subset\mathscr{S}'\subset\mathscr{D}'$. They are dual spaces of $C^\infty\supset S \supset C_c^\infty$ under Lebesgue integral. (All in $\mathbb{R}^n$).
- Formally probability density should be seen as a measure (a distribution) $\mu$. $\mu$ is the Lebesgue-Stieltjes measure associated with a random variable $X$. $\mu$ is a measure on the space $(\mathbb{R}^n,\mathscr{B}_{\mathbb{R}^n})$, where $\mathscr{B}$ denotes Borel algebra.
They are very similar, but there are several essential differences.
- The generalized function is based on the Lebesgue integral, while probability distribution is based on Lebesgue-Stieltjes integral. They are doing integral in different measurable spaces (one contains all the Lebesgue measurable sets, another contains only Borel sets).
- If we regard $\mu$ as a functional (but I don't know how), is $\mu$ in any of $\mathscr{E}'$, $\mathscr{S}'$ or $\mathscr{D}'$?
- $\mathscr{E}'$, $\mathscr{S}'$ and $\mathscr{D}'$ don't seem to be nice sets to represent probability distribution, because continuity is not so important in probability.
So are there some good ways to unify these 2 different things? So that we could apply most of the theorems developed in $\mathscr{E}'$, $\mathscr{S}'$ or $\mathscr{D}'$ to a probability distribution.
I think it is pretty standard to say that a probability space is a triple $(\Omega ,\mathscr B,\mu )$, where $\Omega $ is a set, $\mathscr B$ is a $\sigma $-algebra of subsets of $\Omega $, and $\mu $ is a positive, $\sigma $-additive function from $\mathscr B$ to ${\mathbb R}$, such that $\mu (\Omega )=1$.
The whole point, of course, is that if $E\in\mathscr B$, then the number $\mu(E)$ represents the probability that a randomly chosen point $\omega$ lie in $E$.
If $\Omega ={\mathbb R}^n$, there are many $\sigma $-algebras to choose from, such as the Lebesgue measurable sets, the Borel measurable sets and many more. The criteria for this choice will most likely depend on what random variables you plan to study. For instance, if you only care about continuous random variables, the best choice is certainly the Borel $\sigma $-algebra as that is the essentially the smallest one that works, and then you won't need to worry about stranger looking events.
On the other hand, there are lots of situations in which a probability space pops up from other mathematical gadgets. Perhaps the most common such gadget is the Riesz-Markov-Kakutani representation theorem, stating that positive linear functionals on $C_0({\mathbb R}^n)$ with norm 1, correspond to Borel probability measures on ${\mathbb R}^n$. There is also a nice generalization to locally compact topological spaces.
The correspondence, in the direction "measure $\to$ linear functional", is as follows: given a Borel probability measure on ${\mathbb R}^n$, you may define a linear functional $\varphi _\mu $ by $$ \varphi _\mu (f) = \int_{{\mathbb R}^n} f(x)\, d\mu (x),\quad \forall f\in C_0({\mathbb R}^n). $$
Distributions are also linear functionals on certain function spaces, but their primary use is to generalize the concept of functions in order to allow for more freedom when solving differential equations.
Nevertheless, considering that $C_0({\mathbb R}^n)\supseteq S({\mathbb R}^n)\supseteq C^\infty _c({\mathbb R}^n)$, a probability measure, seen as a linear functional on $C_0({\mathbb R}^n)$ as above, may be restricted to $S({\mathbb R}^n)$, leading to a tempered distribution (the dual of $S({\mathbb R}^n)$).
Since $C_0({\mathbb R}^n)$ neither contains nor is contained in $C^\infty ({\mathbb R}^n)$, there isn't such a nice relationship between probability measures and compactly supported distributions (the dual of $C^\infty ({\mathbb R}^n)$). Nevertheless, if a probability measure has a compact support, it can be used to integrate functions in $C^\infty ({\mathbb R}^n)$, and hence it can be seen as a compactly supported distribution.
It should be stressed that the expression "probability distribution", as in Gaussian distribution, on the other hand, has a totally different meaning, and should not be confused with the spaces of distributions $\mathscr{E}'$, $\mathscr{S}'$, and $\mathscr{D}'$ we've been discussing so far.
By definition, a probability distribution, also called a probability density, on ${\mathbb R}^n$ is a non-negative (Borel or Lebesgue) measurable function $p$ defined on ${\mathbb R}^n$, such that $$ \int_{{\mathbb R}^n} p(x)\, dx=1. $$
Given such a probability distribution, one can define a probability measure $\mu _p$ by $$ \mu _p(E)= \int_E p(x)\, dx, \tag{1} $$ for every (Borel/Lebesgue) measurable set $E$.
A very large number of probability measures in practice (Gaussian, Poisson, Chi-squared) arise in this way, so much so that this point of view is often the predominant one, while Statistics students are sometimes not sufficiently exposed, I believe, to the true notion of probability spaces mentioned in the first paragraph of this answer.
The famous Radon-Nikodym Theorem gives a necessary and sufficient condition for a Borel probability measure $\mu $ to be given in terms of a probability distribution. The condition is that $\mu $ should assign zero probability for all events with Lebesgue measure zero, a condition that is readily verified if $\mu $ is given as in (1).