How to formally argue that concentration bounds are valid for dependent samples that are drawn "better than" i.i.d.

170 Views Asked by At

I'm looking for references that help me formally argue that, in a special case of non-independent sampling, we can apply concentration bounds as if the sample was i.i.d.. Below is a simple concrete example to explain my question.

Suppose we are interested in some statistic (e.g., the expected value) of a probability distribution represented by the rolls of two 6-sided weighted dice. With probability $p$, dice A is thrown, and we get a number in $\{1, 2, \dotsc, 6\}$ (with some unknown probabilities). With probability $1-p$, dice B is thrown, and we get a number in $\{1, 2, \dotsc, 6\}$ (with some unknown probabilities that are different from A). So the result is always an integer in $\{1, 2, \dotsc, 6\}$.

Now suppose we are given an i.i.d. sample of size $n$ (assume $n$ is even, for reasons that will become apparent soon), and then apply something like Hoeffding's inequality to get a probabilistic bound on the mean of the distribution (or some other concentration bound/statistic), but we are forced to discard half of our sample first (for reasons that are not important for the purposes of this question).

We could of course, randomly partition our sample into 2 equally-sized sets, and throw away one of those sets, which results in the remaining set being an i.i.d. sample, to which we can straightforwardly apply our concentration bounds (let's call this the vanilla procedure). Notice that the vanilla procedure is equivalent to drawing an i.i.d. sample of size $n/2$.

Alternatively, if we know which samples are from A and which are from B, we could split things so we throw away half of the rolls from A and half of the rolls from B. For example, following this procedure, if $n=100$ and we got 60 rolls from A and 40 from B, we would throw away 30 randomly selected rolls from A, and 20 randomly selected rolls from B, leaving us with a 30 A rolls and 20 B rolls to estimate our statistic. We'll call this the alternative procedure below.

In other words, instead of randomly throwing away half of the data, the alternative procedure does something similar to (but not exactly the same as) stratification when deciding what data to discard, in an attempt to reduce variance of the estimator.

Intuitively, while a sample from alternative procedure is not sampled independently, it is clearly "more representative of the distribution in expectation" than an i.i.d. sample obtained from the vanilla procedure (sorry for the informal terms, this is the core of the question and I don't know the correct terms).

(It is trivial to show that the resulting sample results in an unbiased estimator of the mean. Intuitively, it may also result in a lower variance estimator of the mean compared to the vanilla procedure, since we avoid situations where we discard a disproportionate number of the rolls from one of the dice. In other words, we are more likely to get rolls of the dice that more closely match the proportions $p$ and $1-p$.)

So, since this sampling method is "better than i.i.d.", it seems very intuitive that we should be able to use concentration bounds as if the sample from the alternative procedure was sampled i.i.d.. How can I formally argue this idea? What limits are there on this argument in terms of applicability? (I am convinced that this method applies to estimating statistics like the mean, but am not sure if it is generally applicable, e.g., the expected sample variance might be higher compared to that of an i.i.d. sample.) Thank you!

Things I've looked at/considered already (reading this section may be helpful, but is not necessary to answer the question):