Thompson Sampling using Gamma priors

224 Views Asked by At

Just a quick question. I've been asked to write a quick example of the Thompson Sampling algorithm for a 2-arm bandit using a Poisson($\lambda_i$) random variable as rewards.

The original notes used a Bernoulli/Beta distribution conjugacy to demonstrate the theory.

I've already submitted the work and so this is purely because after further investigation i think i may have done it wrong.

I'll start with what I have, and then ill explain why I think its wrong.

  1. Bandit with 2 arms. Labelled 1, and 2.
  2. 1 and 2 have unknown parameters $\lambda_1,\lambda_2$ respectively.
  3. despite being unknown we require some prior understanding of these two variables: Let $\Gamma_1(1,1)$ and $\Gamma_{2}(1,1)$ be our prior understanding for each respective arm's unknown parameter.
  4. Begin by sampling a value from these two $\Gamma$ distributions. This is our initial representation of $\lambda_1,\lambda_2$. Call these representations $\hat{\lambda}_1,\hat{\lambda}_2$ respectively.
  5. Check these two values against each other and play the arm corresponding to the larger of the two. ie if $\hat{\lambda}_{1} > \hat{\lambda}_{2}$ play arm 1, otherwise play arm 2. WOLOG: Assume arm 1 was played:
  6. Compute the posterior for the arm played: and update the system: this gives - Arm 1: $\displaystyle{\pi_{1}(\lambda_{1}|x)} \propto \pi_{0}>(\hat{\lambda}_{1})p_{\hat{\lambda}_{1}}(x) \sim \Gamma(1+x,1+1) = \Gamma(1+x,2)$ Distribution - Arm 2: (Since arm 2 was not played): $\displaystyle{\pi_{1}(\lambda_{2}) \sim \Gamma(1,1)}$ In the above: $\pi_{1}(\lambda_1|\cdot)$ denotes the posterior distribution of unknown variable $\lambda_1$ at time step 1. and $\pi_{1}(\lambda_{2}|\cdot)$ is the same respectively for $\lambda_{2}$
  7. Return to step 4. but now use our updated posteriors instead of the original priors - Continue:

Now from my understanding the original theory utilised a $Beta(1,1)$ distribution since it's probabilistically identical to a $Uniform(0,1)$ distribution. but maintains the conjugacy with bernoulli/binomial distributions and so is more useful in this bayesian approach.

Above i used a $\Gamma(1,1)$ distribution under a similiar rationale but with naivety. Further when discussing this with others theyre response was mostly that it shouldnt matter. I'm unsure on whether this is true.

My issue: Assuming that a uniformity for the initial prior is warrented. a $\Gamma(1,1)$ doesnt give us this since it's probabilistically the same as a $Exp(1)$ distribution. Should i have used a transformation at some point? Ie for the Priors $X,Y$ should i have actually used $-\ln X$ and $-\ln Y$? which would have been probabilistically the same as a $Uniform(0,1)$ distribution. And if so, how would i go about changing the above algorithm to be correct?

Thanks for taking the time to read this.