assume you have a two arm bandit with one arm having a fixed, known probability of payoff $p = 0.6$ and another having an unknown payoff $q$, which is drawn uniformly from $[0,1]$. Each game the player gets to pull the bandits $N$ many times, $q$ is revealed to the player at the start of the game. The player will obviously choose the bandit with the higher probability, so the rule is: choose bandit with $\max(p,q)$. What is the expected value of payoff here (if one gets 1 unit payoff per successful pull) at the start of each game?
intuitively, 60% of the time the user will end up with $q \leq p$, and will choose the $p$ bandit. In the remaining 40% of the times, $q > p$, and user will choose $q$, therefore the expected payoff must be greater than 60%.
I'm trying to calculate $E[\max(p,q)]$ formally. I tried this:
$E[\max(p,q)] = \int\max(p,q) \times q \times 1 dq$ (we assume payoff of $1$ which drops out)
since $q \in [0,1]$ and $p$ is fixed and known in advance, we only need to integrate wrt $q$ on $[0,1]$:
$$ E[\max(p,q)] = \int_{0}^{1}\max(p,q) \times q dq \\ $$
yielding:
$$ E[\max(p,q)] = \int_{0}^{0.6} \max(p,q)qdq + \int_{0.6}^{1}\max(p,q)qdq = 0.6(q^2)\big|_{0}^{0.6} + (q^2)\big|_{0.6}^{1} $$
which looks wrong (textbook says it is 0.68 and gives no explanation). Can you show the correct, formal full derivation using expectations? and also give intuition for getting the answer without formal calculation?
It is best to clearly define the random variable of interest. Here, it is the payoff $X$ consisting of the sum of $N$ independent trials $I_1, I_2, \ldots, I_N$, where each $I_k$ is drawn from a Bernoulli distribution with probability $$\Pr[I_k = 1] = \max(0.6,q), \quad k = 1, 2, \ldots, N.$$ Therefore, $$X \sim \operatorname{Binomial}(N, \max(0.6,q)).$$ But $q \sim \operatorname{Uniform}(0,1)$ is itself a random variable, so the conditional expectation, given $q$, is $$\operatorname{E}[X \mid q] = \begin{cases} Nq, & q > 0.6, \\ 0.6N, & q \le 0.6. \end{cases}$$ Hence the unconditional expectation of $X$ is given by the iterated (or double) expectation formula: $$\begin{align*} \operatorname{E}[X] &= \operatorname{E}[\operatorname{E}[X \mid q]] \\ &= \operatorname{E}[Nq \mid q > 0.6]\Pr[q > 0.6] + (0.6 N)\Pr[q \le 0.6] \\ &= N\cdot \frac{0.6 + 1}{2} (1 - 0.6) + (0.6N)(0.6) \\ &= 0.68N. \end{align*}$$