The scenario is such that as long as I have $> 0$ tokens I can continue to play (so I can play into infinity if I have enough money) and I can always opt to stop playing. I can wager any amount as long as I have enough tokens to do so, and I double my wager with probability $p$ and lose my wager with probability $q= 1 - p$ each time I play the game. I begin with $n$ tokens at time $t$. Each attempt at playing the game is independent. Also, if it matters, assume that $p > q$.
In trying to figure out the optimal stopping policy, I went with the common idea that 'stop if the expected value of continuing is less than or equal to the current number of tokens'. But in trying to figure out the expected value of continuing I run into the following issue:
So I am trying to calculate the value of playing the game using the 'dynamic programming'/optimal stopping approach. So at time $t$, the value of one's position and option to play the game is $X_t^{0,n} = \max(n, \mathbb E_t[X_{t+1}^{1,n}], \dots, \mathbb E_t[X_{t+1}^{i,n}], \dots \mathbb E_t[X_{t+1}^{n,n}])$, where $X_i^{k,j}$ is the value of being at time $i$ (with the option of playing the game) having bet $k$ while having amount $j$ at the previous time $i - 1$.
So to calculate $\mathbb E_t[X_{t+1}^{i,n}]$, using the tower property, we have $\mathbb E_t[X_{t+1}^{i,n}] = p \cdot \max(n + i, \mathbb E_{t+1}[X_{t+2}^{k,n+i}])_{k \in [1, n+i]}) + \\ q \cdot \max(n - i, \mathbb E_{t+1}[X_{t+2}^{k,n-i}])_{k \in [1, n-i]})$
This is already looking a bit convoluted, but I think it is 'codeable' recursively so I went with this approach. The issue I face is when I am trying to figure out the conditions for which to stop the evaluations of the expectations recursively. I know that when the token amount is $0$ that the value of the position is $0$.
But I can't seem to figure out how to stop the running of the recursions for when money is constantly being earned. For example, when calculating the cases when the person keeps winning rounds, the calculation of the expectation into the future will just keep going endless since the person can play endlessly.
Another issue is maybe that I am setting the problem up incorrectly, so it'd be helpful to know if there's a better way to figure this out. Also the optimal stopping policy may be much more obvious than doing whatever I am trying. Thanks!
I did not check the details of your recurrence equations, but the issue seems (to me) to be this: the expected values are all infinite.
If I understand correctly, you are trying to calculate the expected value under optimal strategy, right? Instead, lets consider the (sub-optimal?) strategy of just betting one token every time. This is a simple, unit-step-size, 1-D, biased random walk, which is solved here and in some other online course notes. The probability of ever hitting zero is $<1$, i.e. there is a $>0$ probability of never going bankrupt. I cannot find a reference right now, but I'm guessing this, together with the positive expected profit per wager, imply that the expected value for the sub-optimal strategy is infinite - based on the argument that if e.g. you have a $99\%$ chance of going bankrupt and a $1\%$ chance of unlimited growing bankroll, then your expected value is infinite. And of course, if the expected value for the bet-one-token-every-time strategy is infinite, then the expected value for any optimal strategy will also be infinite.
Assuming everything above checks out correctly, then the issue is that asking for optimality in terms of maximizing expected value is not the right question. So the discussion becomes what should be the right question. Here are several modifications I think are related.
First, the best known example of an infinite expected value which is "counter-intuitive" is probably the St Petersburg Paradox. That wikipedia article also discusses several ways to change the question so that it has a more sensible, finite expected value as answer.
Second, if you change your setting and allow fractional wagers, then the problem is often analyzed via the Kelly criterion. This ignores the issue of going bankrupt (which is not what you want) and instead asks for maximizing growth rate (which is intuitively a way of comparing infinities).
Third, and maybe simplest to do, if you put an artificial limit to prevent infinities from happening, then your problem is well defined and the recurrences have finite values. E.g. you can declare there are a maximum of $T= 10000$ turns, or the bankroll can only reach a maximum of $M= 1000000$ tokens, etc.