Multi-armed bandit with infinitely-many arms

600 Views Asked by At

Has anyone studied variants of the multi-armed bandit algorithm with infinitely many arms?

I have a collection of distributions parametrized by an integer $n$. Unfortunately, I can't analytically determine which distribution will have the highest mean, but I have a guess that, say, a value of $n$ near 10 is best. (So, it would be very surprising if it turned out that, say, 1000 were better.) I can phrase this guess as a prior distribution; is there a way to use the prior in a bandit algorithm?

Pursuit algorithms can incorporate a prior, and it looks like they should work with infinitely many arms (for example, they don't start by sampling every arm).

Has anyone studied this extension? Or a different mutli-armed bandit algorithm with infinitely many arms?

1

There are 1 best solutions below

0
On

This is a rather late answer, but hopefully still helpful. For bandit problems with infinitely many arms, there is some work. A good starting point might be $\chi$-Armed Bandits by Bubeck et al. You can find more references therein. Specifically, they deal with a variant where the bandit arms are in a generic measurable space, and the reward function has certain "smoothness" properties. They HOO algorithm they propose is also not too hard to implement if you want to try it out.