Eliminating red balls in a jar with up to four colors

31 Views Asked by At

In a jar, there are $r$ red balls, $b$ blue balls, $y$ yellow balls, and $g$ green balls, for a total of $T=r+b+y+g$ colored balls. Let $(r,b,y,g;T)$ or equivalently $(r,b,y,g)$ represent this state.

I select a ball, look at its color, and adjust the contents of the jar according to its color: (Let the notation $(r,b,y,g)^x$ mean that I selected a ball of $x$ color from a given state:

$$(r,b,y,g;T)^r\rightarrow(r-1,b+1,y,g;T)$$ $$(r,b,y,g)^b\rightarrow(r,b,y,g)$$ $$(r,b,y,g;T)^y\rightarrow(r,b+1,y,g;T+1)$$ $$(r,b,y>0,g;T)^g\rightarrow(r+1,b,y-1,g-1;T-1)$$ $$(r,b>0,0,g;T)^g\rightarrow(r+1,b-1,1,g-1;T)$$ $$(r,0,0,g;T)^g\rightarrow(r+1,0,0,g-1;T)$$ In words, picking a red replaces the red with a blue, picking a blue does nothing, and picking a yellow adds a blue. Picking a green replaces it with a red; additionally, if there are yellows it removes one, and if there are blues but no yellows, it replaces a blue with a yellow.

At any given time, I have a choice of either selecting a ball of a non-red color of my choice or selecting a random ball (with all of the $T$ balls equally likely), with notation of $(r,b,y,g)^z$. Define "win probability" of a state $P(r,b,y,g)$ as the probability that, if I make the best possible choice (best defined as the choice with the highest expected win probability of the subsequent state) at all times, that I will eventually reach a state of $r=0$. Define a "winning state" as a state with a win probability of $1$, and a "losing state" as any other state. Let the score of a winning state $S(r,b,y,g)$ be the expected value of the number of balls I will select before reaching an $r=0$ state, assuming I always make the best choice. (In the case where I am already in a winning state and I have multiple choices that keep me in a winning state, define best choice as the one that minimizes my score).

  1. What does the set of winning states look like?
  2. For any given winning state, what is my score?
  3. What does the set of states where I choose each non-red color look like?

My observations:

In a winning state, I would intentionally select blue iff $S(r,b,y,g)^b+1\leq S(r,b,y,g)^z$ and $S(r,b,y,g)^b\leq S(r,b,y,g)^y$ and $S(r,b,y,g)^b\leq S(r,b,y,g)^g$. The first part of the condition can never occur as $S(r,b,y,g)^b+1=S(r,b,y,g)+1>S(r,b,y,g)^z$. So I would never select blue intentionally in a winning state. As selecting blue in a losing state would not increase my win probability, I will never select blue intentionally.

In a winning state, I would intentionally select yellow iff $S(r,b,y,g)^y+1\leq S(r,b,y,g)^z$ and $S(r,b,y,g)^y\leq S(r,b,y,g)^b$ and $S(r,b,y,g)^y\leq S(r,b,y,g)^g$. In a winning state, I would intentionally select green iff $S(r,b,y,g)^g+1\leq S(r,b,y,g)^z$ and $S(r,b,y,g)^g\leq S(r,b,y,g)^b$ and $S(r,b,y,g)^g\leq S(r,b,y,g)^y$.

A case where $r=0$, regardless of the other values, is a winning state with a score of $0$. A single selection in a state $(r,b,0,0)$ has probability $\frac rT$ of going to $(r-1,b+1,0,0)$ and of $\frac bT$ of remaining unchanged. As such, there is a probability of $(\frac bT)^n$ of not advancing to a state of $(r-1,b+1,0,0)$ within $n$ moves, which be used repeatedly to show that for arbitrarily large $n$, the probability of not advancing down to a state of $r=0$ is arbitrarily small, making all states of the form $(r,b,0,0)$ winning states. Given that it is a winning state, we can calculate a formula for score: $S(r>0,T-r,0,0)=1+\frac {T-r}T S(r,T-r,0,0) + \frac rT S(r-1,T-(r-1)+1,0,0)$, or $S(r>0,T-r,0,0)=\frac Tr +S(r-1,T-(r-1)+1,0,0)$. We can then build this recursively, to come up with the formula $S(r>0,T-r,0,0)= T \sum_{k=1}^r \frac1k$.

In the case $(r>0,b,y>0,0)$, note that for sufficiently large $b$, $P(r,b,y,0)\approx P(r'=r, b'=b+y,0,0)$. To show this, we first state that the chance of randomly selecting a red in a random choice is $\frac rT$; the difference between selecting a red on a random choice now, versus selecting a red after intentionally choosing yellow is $\frac rT-\frac r{T-1}=\frac r{T^2-T}$. As the number of blues increases, this becomes increasingly small. As such, if the number of blues is sufficiently large, the difference in effect between selecting a blue versus selecting a yellow becomes arbitrarily small, and therefore the state $(r,b,y,0)$ with large $b$ can be approximated by the winning state of $(r'=r, b'=b+y, 0,0)$. As such, any green-free state is a winning state; in terms of score, selecting a yellow decreases our chances of selecting a red randomly, so it increases the score, and we will never intentionally select a yellow. However, while I can prove that any green-free state is a winning state, I can't figure out a formula that will give me a score.

As any green-free state is a winning state, and by intentionally selecting green $g$ times I can turn any state $(r,b,y,g)$ into a green-free state, this means that all states are winning states. Also, $S(r,b,y,g)\leq g+ S(r'=r+g,b'=b,y'=\max(y-g,1),0)$. However, as I can't figure out a formula for score in a green-free state that has yellows, I can't find the score for a generic state. Even though I can't find the score in such a state, I can say some things for sure, like that in a yellow-free state I would not intentionally select a green, and that in a state where $y\geq g$, I would not select a yellow.

Is my reasoning sound so far? Can anyone give me a formula for score in a generic state, or at least in a green-free state? If not, can anyone provide reasonable bounds for said score?