Does this random variable have the same distribution as an iid sum?

31 Views Asked by At

Consider the so-called multi-armed bandit problem: There are i.i.d random vectors $X_1,X_2,\ldots \in \mathbb R^d$ with expectation $(\mu_1,\ldots, \mu_d)$. On each turn $n$ I select some coordinate (arm) $I_n \in \{1,2,\ldots, d\}$ and am allowed to see (pull) the $I_n$-coordinate of $X_n$. Call that observation $Y_n$.

On the following turn $n+1$ I am allowed to choose $I_{n+1}$ based on the past observations $Y_1,Y_2,\ldots, Y_n$. We proceed like this, each turn a new vector is generated and I only see one component. Usually we are trying to find a good arm but that is not important for this question.

What I am interested in is the distribution of the empirical mean after $n$ pulls of arm $1$ say. Of course this is not a well-defined random variable, since it might not make sense if I never pull arm 1 enough times. For simplicity let's suppose $\mu_1 =0$. To make sense of the empirical mean let's define $Z_1$ as the observation on the first turn I select $I_n =1$ in case such a turn exists, and $Z_1 =0$ if I never pull arm 1. Likewise let $Z_2$ be the second observation of arm $1$ if it exists, and $Z_2 =0$ otherwise. Then we can define

$$\hat \mu_1(n) = \frac{1}{n} \sum_{i=1}^n Z_i $$

It's straightforward to show the above has mean zero. I wonder is there a straightforward way to show it satifies the same concentration inequalities as $\frac{1}{n} \sum_{i=1}^n W_i $ where $W_i$ are i.i.d and distributed like the first component of $X_1,X_2,\ldots$? More specifically:

$$P \left( \mu_1(n) > \epsilon \right) \le \left( \frac{1}{n} \sum_{i=1}^n W_i > \epsilon\right) \text{ for all }\epsilon >0.$$

I am sure $\mu_1(n)$ does not have the same distribution as $\frac{1}{n} \sum_{i=1}^n W_i $ because for example we might have a pulling rule that means we never pull arm $1$. This means $\mu_1(n)$ is always zero, but that's good in this case because it makes the concentration inequalities easier to hold.

I have a rather lengthy proof using Martingales that it does in fact satisfy. But I wonder am I making this more complicated than it needs to be? Is there an easy approach?