Almost sure convergence in strong law of large numbers.

1.3k Views Asked by At

Strong Law of Large Numbers is often stated as $$\overline{X}_n\ \xrightarrow{a.s.}\ \mu \qquad\textrm{when}\ n \to \infty$$ or $$\Pr\!\left( \lim_{n\to\infty}\overline{X}_n = \mu \right) = 1$$ for $\overline{X}_n$ - average of $n$ i.i.d. random variables with mean $\mu$.

It seems to me from the definition, that order to have a notion of "almost sure convergence" we must have a sequence of random variables $X_i$ on the same probability space; at the same time $\overline{X}_n$ is a random variable on the product of probability spaces of the first $n$ of the $X_i$'s. Of course we can think of all $\overline{X}_k$'s as living on the $n$'th product for $k \leq n$, but to consider all $n$ at the same time, we would have to have an infinite product. This would make sense, but seems somewhat complicated (infinite product spaces!). So, am I missing something, or is this what's going on and all elementary introductions (and wikipedia) just supress this point?

Edit: Ok, so the original question was a bit misleading - and the clarification is a bit long, see comments below, but I decided to post it also as an answer (it overlaps a bit with the accepted one).

3

There are 3 best solutions below

9
On BEST ANSWER

Independence concerns random variables defined on a common probability space. To see this, assume that $X:(\Omega,\mathcal F)\to(E,\mathcal E)$ and $Y:(\Psi,\mathcal G)\to(E,\mathcal E)$ are random variables. To show that $X$ and $Y$ are independent, one would consider events such as $$ [X\in B]\cap[Y\in C]=\{\omega\in\Omega\mid X(\omega)\in B\}\cap\{\psi\in\Psi\mid Y(\psi)\in C\}. $$ Unless $(\Omega,\mathcal F)=(\Psi,\mathcal G)$, this simply does not make sense.

...$\overline{X}_n$ is a random variable on the product of probability spaces of the first $n$ of the $X_i$'s...

Not at all. The random variable $\overline{X}_n$ can only be defined on the common probability space which every $X_n$ is defined on. To define sums $X+Y$ such as the ones every $\overline{X}_n$ requires, one considers $$X+Y:\omega\mapsto X(\omega)+Y(\omega). $$

Maybe one needs infinite product spaces to even talk about a sequence of i.i.d. Xi's

One does not, for the reasons above. If one insists on using a product space, the construction is as follows. Assume that $X_i:(\Omega_i,\mathcal F_i)\to(E,\mathcal E)$, consider $\Omega=\prod\limits_i\Omega_i$, $\mathcal F=\mathop{\otimes}_i\mathcal F_i$ and, for every $i$, the random variable $Z_i:(\Omega,\mathcal F)\to(E,\mathcal E)$ defined by $Z_i(\omega)=X_i(\omega_i)$ for every $\omega=(\omega_i)_i$ in $\Omega$. Then, if each $(\Omega_i,\mathcal F_i)$ is endowed with a probability $P_i$ such that the distribution $P_i\circ X_i^{-1}$ does not depend on $i$ and if $(\Omega,\mathcal F)$ is endowed with the probability $P=\mathop{\otimes}_iP_i$, then indeed $(Z_i)$ is i.i.d. with common distribution $$ P\circ Z_i^{-1}=P_i\circ X_i^{-1}. $$ One may find this kind of construction fascinating. Usually though, after a while, the feeling passes... :-) and one sticks to the modus operandi most probabilists adopt, which is to consider that the exact nature of $(\Omega,\mathcal F,P)$ is irrelevant and that all that counts are the image measures on the target space.

1
On

Why would we need product spaces. $X_i$'s being random variables are measurable functions. Then consider the $\sigma$-field generated by $X_1,\cdots,X_n$. The $\bar{X}_n$ is measurable with respect to this $\sigma$-field. Hence $\bar{X}_n$ is measurable with respect to $\sigma$-field generated by $X_1,X_2,\cdots$. I think that is all you need.

6
On

A random variable $X:(\Omega,\mathcal F, P_\Omega)\to(E,\mathcal E)$ induces a canonical random variable $\hat{X}:(\Omega \times \Psi,\mathcal F \otimes \mathcal G, P_\omega\times P_\Psi)\to(E,\mathcal E)$ on the product with any probability space $(\Psi,\mathcal G, P_\psi)$ by precomposing $X$ with projection. The two random variables $X$ and $\hat{X}$ are equidistributed. I addition, if $Y:(\Psi,\mathcal G, P_\Psi)\to(E,\mathcal E)$ is a random variable on $(\Psi,\mathcal G, P_\psi)$, then $\hat{X}$ and $\hat{Y}$ are independent. Moreover this works even if $Y=X$ - we get $\hat{X}_1$ and $\hat{X}_2$ by composing with different projections. This is a "cheap way" to get i.i.d. random variables.

Now, for a pair of random variables on the same probability space $(P, \mathcal P, P_P)$ (as is the case with i.i.d. variables) valued in $(E,\mathcal E)$ I get a map from $P$ to $E^2$. Moreover, under some reasonable conditions the image of this map is big. (For example, assuming $E$ is topological space and $\mathcal E$ is the Borel sigma-algebra, and then assuming that supports of $X$ and $Y$ are all of $E$, image of this map is dense in $E^2$).

In this spirit, if I have an infinite sequence of i.i.d. random variables, then the space on which they are defined will map to $E^\infty$, and under same assumptions this map will again have a big image. To me this says that the space they are defined on is "big", in some sense as big as the infinite product. In fact other than "fake" example like $[0,1)$ with $X_i(x)=\{i\text{th binary digit of x} \}$ which is just $\{0,1\}^\infty$ in disguise I don't know any way to get an infinite sequence of i.i.d.s on anything other than a product space.