What does it mean to be the variance of a single value?

1k Views Asked by At

I've seen several sources of the Weak Law of Large Numbers (WLLN) proof and they all have something along the lines of the following:

Using Chebyshev's inequality $ P(|X-\mu|\geq a) \leq \frac{Var(X)}{a^2}$, we have,

$$ P(|\overline{X}-\mu|\geq a) \leq \frac{Var(\overline{X})}{a^2} $$

Reflection:
There appears to be some very loose substitution of $X$ with $\overline{X}$, with no justification and this is bothering me. My understanding of variance is that it's a function of $X$, which itself is a function, specifically that the variance is the average squared distance from the population mean $\mu$ across ALL $X_i$ in the range (set) of $X$. Whereas $\overline{X}$ is just a single value. Furthermore, in the algebraic proof that $Var(\overline{X}) = \frac{\sigma^2}{n}$ (which is also used as part of the overall WLLN proof), there is a step that simplifies $Var(X_i)$ to $\sigma^2$, so once again variance is being applied to a single value.

What's more disturbing is that results from Googling "variance of a single value" suggest that it is 0.

What's even more disturbing is that authors simply use variance this way (with the only previous definition of it being on a set of values), and no one questions it.

Last I checked, a set and a single value are different constructs.

Hence...

Question:
What does it mean to be the variance of a single value?

Notes based on answers:

  1. There appears to be some confusion between $X$ and $X_i$. My understanding is X is the probability function of the probability space $<\Omega, U, P>$, and operates on individual elements $\omega$ of the sample space $\Omega$. Whereas $X_i$ is a single observation of $X$, i.e., when I read $X_i$, I am actually reading $X(\omega_i)$, where $\omega_i$ is the $i$th element of the sample space. In the case where $\Omega$ is heights of people, $\omega_i$ might be John's height ($\omega_{i+1}$ might be Jack's height). X(John's height) = 173.2cm, X(Jack's height) = 181.5cm. So variance of $X$ makes sense, while variance of a single observation of $X$ does not. Now if $X_i$ is in fact not what I understand it to be, then a more formal definition of $X_i$ would go a long way.
2

There are 2 best solutions below

14
On BEST ANSWER

The short answer is that the kind of mean and variance that you use in the question are definitions used in descriptive statistics, not probability, whereas the mean and variance referred to in the weak law of large numbers are a different thing entirely.

The longer answer:

The weak law of large numbers (WLLN) is a theorem in the theory of probability. In the theory of probability, a real-valued random variable (such as one of the $X_i$ in a sequence of i.i.d. variables) is a function from an underlying probability space to the real numbers. We might say in one example that $X_i$ has the value $0$ with probability $\frac23$ and $1$ with probability $\frac13,$ or we might say that $X_i$ has a standard normal distribution.

A random variable defined in this way has (usually) an expected value and (usually) a variance. When the variance exists, it is typically not zero unless the probability distribution is trivial, such as $X_i = 2$ with probability $1,$ in which case the variance is zero.

In summary, in a sequence of i.i.d. random variables $X_1,X_2,X_3,\ldots$ to which the weak law of large numbers might apply, each variable has a mean and variance that are determined by the probability distribution of that random variable without knowing what particular value that variable will take by chance and without any regard to the values that any other variables in the sequence might take by chance.

This will make no sense at all if you try to interpret it by taking a list of numbers such as $3,14,2,4,9,1$ and considering the mean and variance of those numbers. The number $3$ is not a random variable (at least, not a non-trivial one). The mean and variance of that list of numbers are descriptive statistics, numbers that describe properties of the list, not of the individual elements of the list.

We use the terms mean and variance for both random variables and descriptive statistics because they have similar formulas and analogous properties. It's an unfortunate side effect that people sometimes use the definition from one context in another context where it does not apply.

It doesn't help that explanations of the law of large numbers (weak or otherwise) tend to incorporate other language of descriptive statistics, such as "sample" or "outcomes of a series of experiments." A motivation for using those terms is that this theorem of probability will be used to justify some methods of descriptive statistics, and they give hints about how it will be used, but the theorem itself is still fundamentally probabilistic, not descriptive.


As mentioned in comments and illustrated in the updated question, a more concrete description of a suitable sample space for the WLLN may help.

First of all, while most texts don't specify this explicitly, if you look at typical examples of sample spaces you may notice that the outcomes in the sample space are mutually exclusive. For example, a typical sample space for two tosses of a coin is $\{ HH, HT, TH, TT \},$ a set of four outcomes, exactly one of which will occur. If the two tosses come up $HT$ in that order, they cannot also have come up $TH$ or $HH.$

If all you are ever going to do in a particular probability exercise is to choose one person living in a certain region at a certain time and measure that one person's height, you could use a list of the heights of people living in that region at that time as a sample space. But if you're going to do more than that, you will need a bigger sample space; when you use the list of heights of people as your sample space, you can't talk about probability that you will have measured both John's height and Jack's height, because those acts of measurement are mutually exclusive.

One useful sample space for the weak law of large numbers would be the set $\Omega$ of all infinite sequences of real numbers. The aim here is (roughly speaking) that if $\omega = (a_1, a_2, a_3, \ldots) \in \Omega,$ we could interpret $a_1$ as the height of the first person selected from a population, $a_2$ as the height of the second person, and so forth.

You might see a problem right away with using such a sample space for the height measurements of a sample of people from the people living in a certain region at a certain time: you have only finitely many people to assign to positions in an infinite sequence. The problem arises because if you restrict yourself to selecting samples of people from a fixed finite set of people without replacement, you can't say anything about what happens with $(X_1,\ldots,X_n)$ in the limit as $n\to\infty$, because to define the limit you need to consider cases where $n$ is greater than the number of people in your finite set. Another hitch is that when you select your sample without replacement, $X_2$ is not independent of $X_1.$ One way to resolve these problems is to select your sample not from a finite set of actual living people, but to select from an infinite set of all people who could possibly have lived in any possible history of the world. A simpler way to resolve it is to allow the sample to be taken with replacement so that if John's height is measured first, John still has just as much chance to be selected for the second measurement as he originally had to be selected for the first measurement, and John (or anyone else) can be measured as many times as it takes to fill out a list of $n$ measurements.

To define a $\sigma$-algebra $\mathcal F$ of events and a probability function $\mathbb P$ over this sample space $\Omega,$ I will adapt a sample space described in the first few pages of a set of notes by Greg Lawler at the University of Chicago.

For each positive integer $n$, we define $\Omega_n$ as the set of finite sequences of real numbers of length $n.$ We define $\mathcal F_n$ as a $\sigma$-algebra on $\Omega_n$ and we define $\mathbb P_n$ as a probability measure on $\mathcal F_n.$ (The exact forms of $\mathcal F_n$ and $\mathbb P_n$ will have to support whatever distribution we want $X_i$ to have in our particular application of the WLLN; in Lawler's notes $X_i$ is a Bernoulli variable with $p=\frac12.$)

Now we can define an event $A \subseteq \Omega$ in terms of a set $E$ of finite sequences as follows:

$$ A = \{(a_1,a_2,\ldots)\mid (a_1,\ldots,a_n)\in E\subseteq \Omega_n\}. $$

Define $\mathbb P^0(A) = \mathbb P_n(E).$ This defines a function $\mathbb P^0$ on a set algebra $$ \mathcal F^0 = \bigcup_{n=1}^\infty \mathcal F_n. $$ Sadly, $\mathcal F^0$ is not a $\sigma$-algebra, so we have to invoke some advanced mathematics to extend $\mathcal F^0$ to a $\sigma$-algebra $\mathcal F$ and to extend $\mathbb P^0$ to a probability function $\mathbb P$ on $\mathcal F.$

Now we can apply the WLLN to the probability space $(\Omega,\mathcal F,\mathbb P).$ Let $\omega = (a_1, a_2, a_3, \ldots) \in \Omega.$ Then $X_1(\omega) = a_1,$ $X_2(\omega) = a_2,$ and in general $X_i(\omega) = a_i.$ If $X = \frac1n\sum_{i=1}^n X_i,$ then $X(\omega)$ is the arithmetic mean of the first $n$ numbers in $\omega.$

This defines each $X_i$ as a random variable and also defines $X$ as a random variable. In order to actually use the WLLN, we will have had to define $\mathbb P$ in such a way that the probability distribution of each $X_i$ is the same as the probability distribution of any $X_j$ and so that any particular $X_i$ is independent of all $X_j$ for $j\neq i.$

As you may have noticed, it takes a lot more ink and effort to explicitly describe a suitable sample space for an application of the WLLN than to make the usual statement of the theorem. (Really a lot of ink and effort, because I have glossed over significant parts of the description, so the description could be much, much longer if these details were fully described.) It is much simpler to describe the properties of all the necessary random variables you will need for an such an exercise (including random variables that are defined in terms of other random variables), including their joint distributions, with the understanding that a suitable sample space for these variables exists, so that's how things are normally done.

7
On

Just as $X_i$ is a random variable, the sample average $\bar X=\frac{1}{n}\sum_{i=1}^n X_i$ is also a random variable. The variance of a random variable is some nonnegative constant. If the random variable itself is (almost surely) a constant, then its variance is zero.