I am working on a question that involves uncertainty and decision making, but I realized I am not making progress for a long time. That is why I formulated a more basic problem in the hope that I can make progress but I still don't know how to proceed.
Suppose there is a game where at each stage $t$ we get a stochastic reward/loss. We can also choose to stop at any stage $t$ and move away with rewards collected until that stage. Let $X_t$ be the random variable that denotes the reward we get at $t$'th stage. $$ X_t = +1 \text{ with probability } p\\ X_t = -1 \text{ with probability } 1 -p $$ with $p$ unkown. We want to find $\tau$ such that $$ \sum_{t = 0}^{\tau} \mathop{\mathbb{E}}[X_t] $$ is maximized. If $p$ were known, this would not be an interesting problem because the decision of continuing or stopping does not depend on the outcomes of the coin. But since we do not know $p$, our decision depends on the previous outcomes, for example, we might find out that it was not logical to play the game at all !
The problem is similar to other exploration-exploitation problems but I could not find anything related to my problem. I tried estimating $p$ and then tried to find a threshold for stopping the game but could not succeed.
I would appreciate any kind of comment, suggestion or references.
Your best guess of the value of $p$ after $t$ tosses is $$\hat{t}=\frac{1+\frac1t\sum_1^tX_i}2$$ You should play as long as $\hat{t}>0$, stop if $\hat{t}<0$, and flip a fair coin if $\hat{t}=0$ ;-)
IOW, you should play as long as you are winning.