The probability of choosing among options $X_1$, $X_2$, $X_3$, $...X_n$ is initially uniform, i.e. $P(X_j)=1/n$. On choosing $X_j$, either success or failure will occur (with unknown probabilities, but irrelevant to this question), and $P(X_j)$ immediately becomes $$\frac{\text{Past Successes of }X_j}{\text{Past Successes of }X_j + \text{Past Failures of }X_j}$$
while the probability of the rest of the options diminish proportionally, to normalize the space.
I know in general this is known as "reinforcement learning," but I'm wondering if there's a specific name for this exact, naive algorithm (or the class of such in which this is an instance).
Allow me to walk through a quick example:
- We have options A, B, and C, the probability of each being, initially, $\frac{1}{3}$.
- We choose option B, and it succeeds. Therefore, the probability of B is now $\frac{1}{1+0}$, and the probabilities of A and C adjust to $0$ in order to normalize the probability space to 1.
- We then choose, helplessly, option B again, since its probability is 1. This time, it fails. Therefore, the probability of B is now $\frac{1}{1+1}$, and the probabilities of A and C adjust to $\frac{1}{4}$.
- We then choose, by chance, option C, and it fails, too. Therefore, the probability of C is now $\frac{0}{0+1}$, and the probabilities of A and B, formerly $\frac{1}{4}$ and $\frac{1}{2}$, adjust to $\frac{1}{3}$ and $\frac{2}{3}$.
And so on and so forth.
Is there a specific name by which this algorithm is known?
(And in case I'm curious later, as I wouldn't like to ask a separate question for this, is there a name for this algorithm but with decaying "memory" of past successes and failures? I imagine it'd simply be descriptive, e.g. "__ with a decaying function." But, it'd be great if there was a name by which an audience would know exactly the system I'm talking about.)
Thank you in advance.