Doubt in proof of naive (epsilon,delta) PAC algorithm for multi armed bandits

225 Views Asked by At

I am learning about algorithms used to solve the Multi-armed bandits problem. I am however stuck in understanding the proof for the naive ($\epsilon$,$\delta$) PAC algorithm (Theorem 6 here). Specifically I am confused by the first step of the proof as shown below: proof of naive($\epsilon$,$\delta$) PAC algorithm. I am unable to see how the RHS is an upper bound of the LHS.Any help would be much appreciated.

1

There are 1 best solutions below

0
On BEST ANSWER

If $\hat{p}_{a'} > \hat{p}_{a^*}$, then either $\hat{p}_{a'} > x$ or $\hat{p}_{a^*} < x$, because if neither is true, then $\hat{p}_{a'} < x < \hat{p}_{a^*}$, which would be a contradiction. Just replace $x$ by $r^* - \epsilon/2$ to get:

$P(\hat{p}_{a'} > \hat{p}_{a^*}) \geq P$($\hat{p}_{a'} > r^* - \epsilon/2$ or $ \hat{p}_{a^*} < r^* - \epsilon/2$). Now, note that $E(R(a')) + \epsilon/2 < r^* - \epsilon/2$, so we have

$\{\hat{p}_{a'} > r^* - \epsilon/2 \} \subset \{ \hat{p}_{a'} > E(R(a')) + \epsilon/2 \}$, since if $p > a$, $p > b$ if $a > b$. We further have $P(A) \leq P(B)$ if $A \subset B$ from which that first line follows.