Interpretation of Kolmogorovs Law of Large Numbers and Convergence

69 Views Asked by At

My lecture slides begin,

Consider a sequence of random numbers $X_1,X_2,...,X_n$ (the lecturer here stated that random numbers and random variables - which in my understanding are maps from a sample space to the real line - can be used interchangeably) all sampled from the same distribution which has mean (expectation) $\mu$.

Then the arithmetic average of the sequence

$S_n=\frac{1}{n}(X_1+X_2+...+X_n)\approx E[x]$

and

$S_n=\frac{1}{n}(X_1+X_2+...+X_n)= E[x]$ as $n\rightarrow \infty$

It then goes on to state almost surely convergence

$X_n \rightarrow X \leftrightarrow P(\omega\in\Omega\mid X_n(w)\rightarrow X(w)$ as $n\rightarrow \infty)=1.$

My question is a conceptual one, what do we mean by $X_n\rightarrow X$ and $X_n(w)\rightarrow X(w)$?

For that matter what do we mean by $X_n$? Are these aprroximations to the distribution $X$ after the $n_{th}$ trial if so when summing $S_n$ are we summing "maps from a sample space to the real line"? How would one add a map?

Am I correct in reading almost sure convergence as "$X_n$ becomes the distribution $X$ if and only if the probability of an outcome is 1 such that $X_n$ evaluated at that outcome becomes the value of $X$ evaluated at that outcome as we take successive approximations of $X$, $X_n$"

Please help!

2

There are 2 best solutions below

1
On

First, a distribution is not the same thing as a random variable. A random variable $X$ is a map from a sample space usually denoted by $\Omega$, to $\mathbb{R}$ or whatever measurable space. The distribution of a random variable is a probability measure over this measurable space.

When we write $S_n = (X_1+\dots +X_n)/n$, it justs means that for each $\omega\in\Omega$, $S_n(\omega) = (X_1(\omega) + \dots + X_n(\omega))/n$. This is the usual sum operation for maps.

Now almost sure convergence is stronger than convergence in distribution, it is a pointwise convergence, for almost every $\omega$. Here almost means that the set of $\omega$'s for which the convergence holds has probability $1$.

So whatever $\omega$ you pick in this probability $1$ set, you'll have, for $n$ large enough, $S_n(\omega)\simeq E[X_1]$. Once you've picked $\omega$, it's just a convergence of a sequence of real numbers. So it's not about the distributions, it's about the random variables themselves, which is stronger.

$X_n\rightarrow X$ is ambiguous because there are several modes of convergence for random variables. We usually specify which one it is. For almost sure convergence we can write $$X_n\rightarrow X\, \textrm{a.s.}$$ a.s means almost surely.

0
On

I take it that the random variables $X_1,X_2,\ldots$ are assumed to be independent - this is an important assumption.

For a given element of the sample space, $\omega$, the sequence $X_1(\omega),X_2(\omega),\ldots$ is a sequence of real numbers. Saying that $X_n(\omega)\to X(\omega)$ is shorthand for saying that the limit $$ \lim_{n\to\infty}X_n(\omega) $$ exists and equals $X(w)$. That is, for all $\omega$ and for all $\epsilon$, there exists a number $N(\omega,\epsilon)$ such that for all $n>N(\omega,\epsilon)$ we have $|X_n(\omega)-X(\omega)|<\epsilon$.

Saying $X_n\to X$ is shorthand for saying that $X_n(\omega)\to X(\omega)$ for all $\omega$ in the sample space.

Each $X_n$ is a random variable, that is, a map from the sample space to $\mathbb R$. However, we are also assuming that these random variables are independent. That means we are assuming that there is a common sample space $\Omega$ such that $X_1,X_2,\ldots$ are all functions from $\Omega$ to $\mathbb R$, meaning that for every $n$ and for every sequence of real numbers $a_1,\ldots,a_n$ the events $\{X_1<a_1\},\ldots,\{X_n<a_n\}$ are independent.