Optimal Number of Realizations for a Discrete Stochastic Process

56 Views Asked by At

I have a curiosity concerning discrete stochastic processes. Let us say we have a discrete stochastic process $X_{i} = \left(x_1,x_2,...x_i,...,x_N \right)$, hence we have N random variables with an unknown statistical distribution. Let us say now that we have M of these random vectors so we have M realizations of my stochastic process. Is there any mathematical result about the "optimal" number of M we need in order to infer the statistics of each random variable $x_i$ ?

Intuitively I would say that M has to be equal or greater than N but it is just a guess without any kind of prove. I am not a mathematician and for this reason I am asking to someone way more expert than me.

Thanks in advance!

1

There are 1 best solutions below

3
On

The best number of sample values is a full census of the population (i.e., $n=N$): In any problem like this, where you are sampling-without-replacement from a finite population, if you sample $n=N$ values, you will have observed all the population values in your sample, so there is no statistical inference problem left --- i.e., as soon as you have sampled $n=N$ values there is nothing left to "infer". (And you certainly don't need to keep sampling after you have already observed all the values in the population.)

If there is no cost or other limitation that would prevent a full census of the population then that is going to give you the most accurate results. You just observe all $n=N$ values, write them down, and then you have a full description of the population. Statistical inference arises when this is infeasible (e.g., when sampling has a cost), so that we need to rely on a sample that is smaller than the full population. Higher values of $0 \leqslant n \leqslant N$ give more data, which gives more information for the inference of the remaining unobserved values. Thus, in practice, sample sizes are usually determined by deciding how much data you need to get an inference at a desired level of accuracy, while minimising cost.