Everything I can find on the Gittins Index is extremely in depth and abstract with almost no examples. I have spent hours digging through academic papers, lecture notes, and Wikipedia sources. I understand the Gittins Index conceptually, but I would like to include it in program, so I need to know how to compute it (even if the algorithm has a complexity of n factorial).
Is there anyone who can actually solve a simple example like the one below?
This is a version of the classic multi-armed-bandit problem:
I'm at a casino, there are 3 machines M_1, M_2, M_3
Every machine pays out \$1 for a win, \$0 for a loss
I've played M_1 three times, it has 2 wins 1 loss
I've played M_2 four times, it has 2 wins 2 losses
I've played M_3 two times, it has 1 win 1 loss
If I discount future payouts by 50%;
What is the Gittins Index of M_1?
(An actual number in decimal form)
What steps do you take to get that number?
(Pseudo code would be great)
Thanks for at least reading the problem