Statistical sampling and random variables?

42 Views Asked by At

I'm studying statistical sampling and there is a point which is not very clear to me. Let us discuss the following example.

Suppose we would like to study the heights of 3.000 students in a given school. Let us do this taking 80 samples of size 25 each. For each sample there should be an associated random variable, let us say $X_1, \ldots, X_{25}$, which will give the heights.

My question is, what are the domains of $X_1, \ldots, X_n$?

As it seems it can't be the set of the respective $25$ samples because usually one wants to perform some kind of algebraic manipulation using the values of $X_i(\omega)$. For instance, in order to perform

$$X_1(\omega)+\ldots+X_{25}(\omega)$$ the variable $\omega$ should belong to the intersection of all the domains of $X_1, \ldots, X_{25}$. On the other hand, if we defined $X_1, \ldots, X_{25}$ on the set of all students what would $X_i(\omega)$ be if $\omega$ is not in the $i$ th sample.

The only way I see to figure this out is to allow the random variable to be defined on the set of all students although the real data we know about the variable is their values on the respective sample set.

Can someone clear this out?

Thanks.

1

There are 1 best solutions below

1
On BEST ANSWER

The domain / sample space $\Omega$ needs to be rich enough to encompass all possible outcomes of your experiment. Remember that after you conduct the experiment, you will have in front of you just one element $\omega$ of the sample space $\Omega$.

A reasonable domain $\Omega$ for conducting sampling is the set of all possible sequences of individuals that could result from your sampling. For example, if your experiment samples $100$ individuals from a population of people, or a box of tickets, or an urn of marbles, then take $\Omega$ to be the set of all possible vectors of size $100$ from this population, conformant with your sampling protocol. (If your sampling is performed without replacement, then those vectors should not contain duplicates.)

Given this sample space, it is natural to associate $X_i(\omega)$ with whatever measurement you're taking on the $i$th member of your vector $\omega$, and it is straightforward to perform numerical manipulation of your random variables $X_1,\ldots,X_n$.