I have been recently trying to understand the proofs related to convergence and regret analysis of multi-arm bandits. These proofs seem to use a variety of mathematical skills such as measure theory, sampling statistics, convergence bounds, etc.
I am trying to understand papers like
- http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf
- http://www.jmlr.org/papers/volume3/auer02a/auer02a.pdf
- https://arxiv.org/pdf/1204.1909.pdf
- http://www.economics.uci.edu/~ivan/asmb.874.pdf
I know probability and statistics basics upto under-grad level. However, I am facing difficulty in completely understanding their proofs. Any resource such book, or course I should know before understanding these proofs would be highly helpful.
Thanks.