using biased samples to get tighter concentration bounds

43 Views Asked by At

Suppose $\{X_1,X_2,\cdots,X_N\}$ are sampled from a fixed distribution $P_X$. Under different assumptions, we may have $|\frac{1}{N}\sum_{i=1}^N X_i- \mathbb{E}(X)|\leq f(N,\delta)$ for a given confidence level $\delta$.

Suppose there is another random variable $Y$ with similar expectation as $X$, e.g., $|\mathbb{E}(X)-\mathbb{E}(Y)|\leq \epsilon$. Given $M$ samples of $Y$, can we have a tighter concentration bound for $X$ by using the samples from $Y$?

Intuitively, if $X$ and $Y$ are identical, then using samples from $Y$ will provide a better estimate of $X$ and obtain a tighter bound.

In other words, under what assumptions, with high probability, does the following inequality hold? $$|\frac{1}{N+M}(\sum_{i=1}^N X_i+\sum_{i=1}^M Y_i)- \mathbb{E}(X)| \leq |\frac{1}{N}\sum_{i=1}^N X_i- \mathbb{E}(X)|$$