[data generating process]-[sampling from an infinite population]-[i.i.d.]: some clarifications

89 Views Asked by At

I am confused on the relation among [data generating process]-[sampling from an infinite population]-[i.i.d.]

Could you tell me whether what I wrote below is a correct interpretation or what is wrong?

It may look a silly and overcomplicated question but in my actual problem I have some peculiar data generating processes and I want to be sure that I understand every step correctly for the easiest cases.

I could not find any source relating the 3 concepts in a clear way: many statistic books explain very well the concept of sampling from an infinite population and i.i.d. but they ignored data generating processes, and viceversa for many econometric books.


CLASSICAL CASE:

  • Consider an infinite population of individuals; each individual $i$ in the population is endowed with some characteristics $(y^i, x^i, u^i)$ and it holds that $y^i=\beta x^i+u^i$ (for example, $y^i$ is income, $x^i$ is education, $u^i$ are other variables unobserved by the researcher).

  • Suppose that the researcher wants to get a good approximation of average income and education.

  • In order to do so, we need to construct a representative finite sample.

  • This is handled by defining some random variables $Y,X,U$. For each individual $i$ in the population, $(y^i, x^i, u^i)$ represents a draw from $p_{Y,X,U}$

    $(\star) \hspace{1cm}$ such that the chances of extracting a given value $(y,x,u)$ is fixed and equal to $p_{Y,X,U}(y,x,u)$ at each extraction $\forall y,x,u$

  • Being interested in average income and education means being interested in $E_{p_{Y,X}}(Y,X)$.

  • In order to obtain a representative finite sample, the researcher draws $M$ realisations of $Y,X$ from $p_{Y,X}$, such that the chances of extracting a given value $(y,x)$ is fixed and equal to $p_{Y,X}(x,y)$ at each extraction $\forall y,x$. The researcher registers the extracted realisations of $Y,X$. Let the realisation of $(Y,X)$ at the $m$th extraction be denoted by $(y_m, x_m)$.

  • $(y_m, x_m)$ can be thought as the realisation of a random vector $(Y_m, X_m)$ $\forall m \in \{1,...,M\}$, such that, given the adopted extraction scheme, $(Y_m,X_m) \sim (Y,X) \sim p_{Y,X}$ $\forall m \in \{1,...,M\}$ and $\{Y_m, X_m\}_{m=1}^M$ are mutually independent across $m$.

  • The researcher is now ready to compute $\bar{y}\equiv \frac{1}{M}\sum_{m=1}^M y_m$ and $\bar{x}\equiv \frac{1}{M}\sum_{m=1}^M x_m$. By the weak LLN we know that $(\bar{y}, \bar{x})$ is a good approx of $E_{p_{Y,X}}(Y,X)$.


DEPENDENCE I

  • Suppose that now, instead of $y^i=\beta x^i+u^i$, it holds that $y^i=\alpha y^{i-1}+u^i$

  • The procedure above to extract a finite representative sample fails because $(\star)$ is not correct, given the functional dependence of $y^i$ on $y^{i-1}$.

  • Given that failure, what people do is to think about $y^i$ as one realisation of a random variable $Y^i$ $\forall i$ and about the sequence of random variables $\{Y^i\}_{i}$ as a stochastic process.

  • If one has many populations, then the steps above can be followed (just at the population level instead of at the individual level). Otherwise, if one has one population only, it may still be worth to select a finite sample from it, e.g. $(y^1,..., y^M)$ with $y^0\equiv 0$, compute $\bar{y}\equiv \frac{1}{M}\sum_{m=1}^M y^M$ and see whether this approximates some object of interests according to some alternative LLNs.


DEPENDENCE II

  • Suppose that now, instead of $y^i=\beta x^i+u^i$, it holds that $y^i=\beta x^i+(u^i*u^{i-1})$

  • Are we back to the case DEPENDENCE I?