In a jar, there are $r$ red balls, $b$ blue balls, $y$ yellow balls, and $g$ green balls, for a total of $T=r+b+y+g$ colored balls. Let $(r,b,y,g;T)$ or equivalently $(r,b,y,g)$ represent this state.
I select a ball, look at its color, and adjust the contents of the jar according to its color: (Let the notation $(r,b,y,g)^x$ mean that I selected a ball of $x$ color from a given state:
$$(r,b,y,g;T)^r\rightarrow(r-1,b+1,y,g;T)$$ $$(r,b,y,g)^b\rightarrow(r,b,y,g)$$ $$(r,b,y,g;T)^y\rightarrow(r,b+1,y,g;T+1)$$ $$(r,b,y>0,g;T)^g\rightarrow(r+1,b,y-1,g-1;T-1)$$ $$(r,b>0,0,g;T)^g\rightarrow(r+1,b-1,1,g-1;T)$$ $$(r,0,0,g;T)^g\rightarrow(r+1,0,0,g-1;T)$$ In words, picking a red replaces the red with a blue, picking a blue does nothing, and picking a yellow adds a blue. Picking a green replaces it with a red; additionally, if there are yellows it removes one, and if there are blues but no yellows, it replaces a blue with a yellow.
At any given time, I have a choice of either selecting a ball of a non-red color of my choice or selecting a random ball (with all of the $T$ balls equally likely), with notation of $(r,b,y,g)^z$. Define "win probability" of a state $P(r,b,y,g)$ as the probability that, if I make the best possible choice (best defined as the choice with the highest expected win probability of the subsequent state) at all times, that I will eventually reach a state of $r=0$. Define a "winning state" as a state with a win probability of $1$, and a "losing state" as any other state. Let the score of a winning state $S(r,b,y,g)$ be the expected value of the number of balls I will select before reaching an $r=0$ state, assuming I always make the best choice. (In the case where I am already in a winning state and I have multiple choices that keep me in a winning state, define best choice as the one that minimizes my score).
- What does the set of winning states look like?
- For any given winning state, what is my score?
- What does the set of states where I choose each non-red color look like?
My observations:
In a winning state, I would intentionally select blue iff $S(r,b,y,g)^b+1\leq S(r,b,y,g)^z$ and $S(r,b,y,g)^b\leq S(r,b,y,g)^y$ and $S(r,b,y,g)^b\leq S(r,b,y,g)^g$. The first part of the condition can never occur as $S(r,b,y,g)^b+1=S(r,b,y,g)+1>S(r,b,y,g)^z$. So I would never select blue intentionally in a winning state. As selecting blue in a losing state would not increase my win probability, I will never select blue intentionally.
In a winning state, I would intentionally select yellow iff $S(r,b,y,g)^y+1\leq S(r,b,y,g)^z$ and $S(r,b,y,g)^y\leq S(r,b,y,g)^b$ and $S(r,b,y,g)^y\leq S(r,b,y,g)^g$. In a winning state, I would intentionally select green iff $S(r,b,y,g)^g+1\leq S(r,b,y,g)^z$ and $S(r,b,y,g)^g\leq S(r,b,y,g)^b$ and $S(r,b,y,g)^g\leq S(r,b,y,g)^y$.
A case where $r=0$, regardless of the other values, is a winning state with a score of $0$. A single selection in a state $(r,b,0,0)$ has probability $\frac rT$ of going to $(r-1,b+1,0,0)$ and of $\frac bT$ of remaining unchanged. As such, there is a probability of $(\frac bT)^n$ of not advancing to a state of $(r-1,b+1,0,0)$ within $n$ moves, which be used repeatedly to show that for arbitrarily large $n$, the probability of not advancing down to a state of $r=0$ is arbitrarily small, making all states of the form $(r,b,0,0)$ winning states. Given that it is a winning state, we can calculate a formula for score: $S(r>0,T-r,0,0)=1+\frac {T-r}T S(r,T-r,0,0) + \frac rT S(r-1,T-(r-1)+1,0,0)$, or $S(r>0,T-r,0,0)=\frac Tr +S(r-1,T-(r-1)+1,0,0)$. We can then build this recursively, to come up with the formula $S(r>0,T-r,0,0)= T \sum_{k=1}^r \frac1k$.
In the case $(r>0,b,y>0,0)$, note that for sufficiently large $b$, $P(r,b,y,0)\approx P(r'=r, b'=b+y,0,0)$. To show this, we first state that the chance of randomly selecting a red in a random choice is $\frac rT$; the difference between selecting a red on a random choice now, versus selecting a red after intentionally choosing yellow is $\frac rT-\frac r{T-1}=\frac r{T^2-T}$. As the number of blues increases, this becomes increasingly small. As such, if the number of blues is sufficiently large, the difference in effect between selecting a blue versus selecting a yellow becomes arbitrarily small, and therefore the state $(r,b,y,0)$ with large $b$ can be approximated by the winning state of $(r'=r, b'=b+y, 0,0)$. As such, any green-free state is a winning state; in terms of score, selecting a yellow decreases our chances of selecting a red randomly, so it increases the score, and we will never intentionally select a yellow. However, while I can prove that any green-free state is a winning state, I can't figure out a formula that will give me a score.
As any green-free state is a winning state, and by intentionally selecting green $g$ times I can turn any state $(r,b,y,g)$ into a green-free state, this means that all states are winning states. Also, $S(r,b,y,g)\leq g+ S(r'=r+g,b'=b,y'=\max(y-g,1),0)$. However, as I can't figure out a formula for score in a green-free state that has yellows, I can't find the score for a generic state. Even though I can't find the score in such a state, I can say some things for sure, like that in a yellow-free state I would not intentionally select a green, and that in a state where $y\geq g$, I would not select a yellow.
Is my reasoning sound so far? Can anyone give me a formula for score in a generic state, or at least in a green-free state? If not, can anyone provide reasonable bounds for said score?