Formula for estimating a continuous random variable in sampling, while most classic theory are discrete

695 Views Asked by At

Very often we would like to estimate a continuous variable Y (e.g. mean weight, mean length) in sampling design. However, most of the literature in sampling theory seem to treat the sampling variable as discrete. The most common estimation of mean y from a sample size of n is:

$E(Y)=\sum_{i=1}^{n} y_{i} p(y_{i})$

Given a bit more complicated stratified sampling design with a strata. The strata is usually a discrete random variable (e.g. site, district). We often also estimate the mean Y at a given stratum. This would involves the conditional probability of a continuous variable y, condition on a discrete variable.

can some one give me more insight about:

1) On what rational is the Y variable being treated as discrete in sampling theory? is it because the selection probabilities of samples are discrete (and this is what we have) Or it is just for less computation complexity?

2) Is it possible to write the estimation formula with Y as a continuous random variable?

3) If 2) is true, on what situation would the discrete formula approach the continuous formula?

My key puzzle is how to link classical estimation in sampling theory with a probability theory based formula, if I want to highlight that my sampling variable is a continuous variable.

thanks!

1

There are 1 best solutions below

2
On BEST ANSWER

I think there are three distinct concepts of mean you are talking about.

(1) For a continuous random variable $Y$ supported on $\Omega$ with density $f_Y(y)$, the mean is written: $$ \mathbb{E}[Y] = \int_\Omega y f_Y(y)dy $$

(2) For a discrete random variable $X$ with pmf $p(x)$, taking values $\{x_i\}$ the mean is given by $$ \mathbb{E}[X] = \sum_i x_i\, p(x_i) $$

(3) Given a set of samples $S=\{s_i\}$, realized from a random variable $Z$, which can be from either a continuous or a discrete random variable, the sample mean is given by $$ \hat{\mu}(S) = \frac{1}{|S|}\sum_{i} s_i $$

Note that (3) is used regardless of whether $S$ comes from a discrete or continuous RV. In both cases, $\hat{\mu}$ will converge to $\mathbb{E}[Z]$, regardless of whether $Z$ is discrete or continuous. The important fact is that $\mathbb{E}[Z]$ is the true, theoretical mean of the random variable, whereas $\hat{\mu}$ is an approximation of it.


How do we know that $\hat{\mu}$ converges to $\mathbb{E}[Z]$? Using the Law of large numbers, which says that (under mild conditions): $$ P\left( \lim_{|S|\rightarrow\infty} \hat{\mu}(S) = \mathbb{E}[Z] \right)=1 $$ In words, as the number of samples increases, the probability that the sample mean equals the true mean converges to 1. It doesn't matter whether $Z$ is continuous or discrete. This means that (3) converges to (1) and (2) as $|S|$ increases. See also here and here.


The other question embedded in your question seems to be how (2) can converge to (1). A great discussion of that is here.

I'll show a more heuristic (less rigorous) discussion, just to show that we can consider a discrete random variable $V$, taking values in $S_v=\{v_i\}$, to be a continuous random variable. Let $U$ be a continuous random variable with density: $$ f_U(u) = \sum_i P(V=u) \delta_m(x-x_i) $$ where $\delta_m(a)=\mathbb{I}[0\in\{a\}]$ is the Dirac measure. Notice that $f_U(u)=0$ if $u\notin S_v$ and $f_U(u)=P(V=u)$ otherwise. Further, $$ \int_{-\infty}^\infty f_U(u)du = \int_{-\infty}^\infty \sum_i P(V=u)\delta(u-v_i)du = \sum_i P(V=v_i) = 1 $$ as we would expect. Then, for the mean, we get: \begin{align} \mathbb{E}[U] &= \int_{-\infty}^\infty uf_U(u) du \\ &= \int_{-\infty}^\infty u \sum_i P(V=u)\delta(u-v_i)du \\ &= \sum_i \int_{-\infty}^\infty u P(V=u)\delta(u-v_i)du \\ &= \sum_i v_i P(V=v_i) \\ &= \mathbb{E}[V] \end{align} Notice that we have used the definition of $\mathbb{E}[U]$ to be a continuous RV and the definition of $\mathbb{E}[V]$ as a discrete RV.

So if you simply redefine any discrete RV as a continuous RV as above, the definitions of means coincide.