I am struggling a little to understand almost sure convergence in probability theory. I have taken some general measure theory and there we had abot convergence almost everywhere. Basically it was defined that if a function sequence $f_n$ converged almost everywhere, the set where it didn't converged had measure 0.
My question is about the almost sure convergence in probability theory: http://en.wikipedia.org/wiki/Convergence_of_random_variables#Almost_sure_convergence
I do not really see what the space $\Omega$ and what the value $\omega$ is. For instance, lets say you have a random varble $X_n$ which is normale distributed, and lets say you have N of these that are identically, independent normally distributed. Then I know that $y_N$ which is the avarage of these converges a.s to the expectation $\mu$. However what is $\Omega$ and $\omega$ in this case?
I mean for each random variable we have a probability space which is the real numbers, the appropriate $\sigma$-algebra on the real numbers(is this the Lebesgue $\sigma$-algebra?), and the measure which is the integral of the normal-distribution function(?). Do we have to create $\Omega$ and $\omega$ from this?
Do you know about any good books that transists from ordinary measure theory to probability measure theory?
If you consider an infinite sequence $\{X_1, X_2, X_3, \ldots\}$ of random variables, you can define $\omega$ as the particular outcome of this infinite sequence. So a particular outcome has the form $\omega = (x_1, x_2, x_3, \ldots)$, where each $x_i$ is a real number. Then $\Omega$ is the set of all possible $\omega$ outcomes, being the set of all sequences of real numbers. [Edit a few days later: In some problems it is easier to build the probability model starting with a set $\Omega$ of all outcomes, where the outcomes $\omega$ already have a structure that is natural to the problem. Then you define $X_1=X_1(\omega)$, $X_2=X_2(\omega)$, and so on, so that each $X_i(\omega)$ is a function of the outcome $\omega$. With either construction, the outcome $\omega$ completely determines the infinite sequence $\{X_1, X_2, X_3, \ldots\}$.]
So if you define $Y_N = \frac{1}{N}\sum_{i=1}^NX_i$, then $Y_N$ has a probability distribution that depends on the joint CDF $Pr[X_1\leq x_1, X_2\leq x_2, \ldots, X_N\leq x_N]$. You see that as $N$ gets large, the joint CDF function includes more and more of the variables. We can think of defining the probability measure over the infinite sequence according to limits of the finite CDF functions.
The statement $Y_N\rightarrow c$ with probability 1 (where $c$ is a particular real number) can be understood in the following two equivalent ways:
1) $Pr[\lim_{N\rightarrow\infty} Y_N = c] = 1$.
2) For all $\epsilon>0$ we have:
$\lim_{N\rightarrow\infty} Pr[|Y_N-c|\leq \epsilon, |Y_{N+1}-c|\leq \epsilon, |Y_{N+2}-c|\leq \epsilon, \ldots] = 1$.
Note that $Pr[|Y_N-c|\leq \epsilon, |Y_{N+1}-c|\leq \epsilon, |Y_{N+2}-c|\leq \epsilon, \ldots] = Pr[\cap_{i=N}^{\infty} \{|Y_{i}-c|\leq \epsilon\}]$ is the probability that all random variables $Y_i$ are within $\epsilon$ of $c$ (for all $i \geq N$). This condition is much stronger than the condition for convergence in probability, which only requires that for all $\epsilon>0$ we have $\lim_{N\rightarrow\infty}Pr[|Y_N-c|\leq \epsilon] = 1$. Thus, convergence with probability 1 implies convergence in probability.
To prove that $Y_N\rightarrow c$ with probability 1, it can be shown that it suffices to prove that for all $\epsilon>0$ we have $\sum_{i=1}^{\infty} Pr[|Y_i-c|>\epsilon] < \infty$. That is because, by the union bound:
\begin{eqnarray*} 1-Pr[\cap_{i=N}^{\infty}\{|Y_i-c|\leq \epsilon\}] &=& Pr[\cup_{i=N}^{\infty} \{|Y_i-c|>\epsilon\}] \\ &\leq& \sum_{i=N}^{\infty}Pr[|Y_i-c|>\epsilon] \end{eqnarray*} and a sufficient condition for the final summation to converge to $0$ as $N\rightarrow\infty$ is $\sum_{i=1}^{\infty} Pr[|Y_i-c|>\epsilon] < \infty$.
Example 1: Let $\{Z_1, Z_2, Z_3, \ldots\}$ be independent random variables such that: \begin{eqnarray*} Z_i = \left\{\begin{array}{cc} 1 & \mbox{with probability $1/i^2$} \\ 0 & \mbox{else} \end{array}\right. \end{eqnarray*}
Then for any $\epsilon$ such that $0<\epsilon<1$ we have $\sum_{i=1}^{\infty} Pr[Z_i>\epsilon] = \sum_{i=1}^{\infty} 1/i^2 < \infty$, and so $Z_i\rightarrow 0$ with probability 1. This argument did not use independence of the $Z_i$ variables, and so it would be true whenever their marginal distributions satisfy the above.
Example 2: Let $\{Z_1, Z_2, Z_3, \ldots\}$ be independent random variables such that: \begin{eqnarray*} Z_i = \left\{\begin{array}{cc} 1 & \mbox{with probability $1/i$} \\ 0 & \mbox{else} \end{array}\right. \end{eqnarray*}
Then for any $\epsilon$ such that $0<\epsilon<1$ we have $\sum_{i=1}^{\infty} Pr[Z_i>\epsilon] = \sum_{i=1}^{\infty} 1/i = \infty$. So this summation test does not allow us to conclude anything about convergence with probability 1. Since the $Z_i$ variables are independent, it can be shown that, with probability 1, there will be an infinite number of indices $i$ such that $Z_i=1$. So $Z_i$ does not converge to 0 with probability 1. However, it is easy to see that these $Z_i$ variables do converge to 0 in probability.