Multiple sampling from the same probability distribution is automatically independent?

97 Views Asked by At

Suppose you have (a discrete - for simplicity) probability distribution: E.g., $\Omega=\{a,b,c\}$ with $\mathbb{P}(a)=0.2$, $\mathbb{P}(b)=0.3$ and $\mathbb{P}(c)=0.5$.

Suppose I have some device (e.g. a computer program, such as the numpy library's randint() function) that contains the description "This will provide you with as many random samples from $\mathbb{P}$ as you want". Applying this device I obtain the sequence of such samples $x_1,\ldots,x_n\in\Omega$.

How can I prove that these were generated in an independent way? Or how can I at least determin the probability that these were generated in an independent way? Is it even possible to do that, or is my question actually meaningless?

(Independence is a concept that is only defined for random variable or events (as far as I know), so what would the random variable or events be, that I need to consider to make the previous question formal?)

Please note: I know graduate-level mathematics (think: measure theory), but I have trouble connecting the abstract machinery, that I know, to the real world, where you actually deal with samples and stuff.

2

There are 2 best solutions below

0
On BEST ANSWER

Your question seems to boils down to testing whether a distribution $p$ over a product space $\prod_{n=1}^\infty \Omega$ is a product distribution, under the assumptions that all its marginals are equal, and given the ability to get exactly one sample from $p_$ (defined as the marginal of $p$ on $\prod_{n=1}^N \Omega$) for your choice of $$ (where I assume you can choose $$ randomly yourself as well).

That is, there is a single (unobserved) realization $$ x\in \Omega^\infty $$ from a random variable $X\sim p$. Your task is to choose $N \in \mathbb{N}$, upon which you observe the projection $\pi_N(x)$ of $x$ on $\Omega^N$ (which is thus distributed according to $p^N$). Your goal is to distinguish between the cases (i) $p$ is of the form $q\times \dots\times q\times\dots$ for some probability distribution $q$ over $\Omega$, and (ii) $p$ is not equal to any such product-distribution-with-same-marginals.

As mentioned in a comment, I would gather that unless you make extra assumptions on $p$, then you cannot do much.

0
On

The output of numpy.random.randint is actually deterministic. You can verify this by calling numpy.random.seed before each call to numpy.random.randint, using the same parameter each time you call seed and the same list of parameters each time you call randint. You will get the same results every time. The results are only "pseudorandom", not truly random as we would usually interpret the meaning of "random."

This is a useful feature when you're trying to debug a program that depends on "random" input, because it means you can make an exactly repeatable test of the program.

If you do not reset the random number generator by calling seed, the output of numpy is supposed to give you random variables that are independent of each other.

There is no way to prove that the variables are all truly independent. If you pull a long enough sequence from numpy then I believe the results will not be independent. There are various tests that people apply to RNGs to test whether they are sufficiently "independent" (in a pseudorandom sense), but all you get from these is something like, "The outcomes are not too obviously dependent in this particular way."