Optimal stopping time for coin toss with unkown bias

Question

Optimal stopping time for coin toss with unkown bias

401 Views Asked by Bumbble Comm At 10 May 2026 - 6:26

I am working on a question that involves uncertainty and decision making, but I realized I am not making progress for a long time. That is why I formulated a more basic problem in the hope that I can make progress but I still don't know how to proceed.

Suppose there is a game where at each stage $t$ we get a stochastic reward/loss. We can also choose to stop at any stage $t$ and move away with rewards collected until that stage. Let $X_t$ be the random variable that denotes the reward we get at $t$'th stage. $$ X_t = +1 \text{ with probability } p\\ X_t = -1 \text{ with probability } 1 -p $$ with $p$ unkown. We want to find $\tau$ such that $$ \sum_{t = 0}^{\tau} \mathop{\mathbb{E}}[X_t] $$ is maximized. If $p$ were known, this would not be an interesting problem because the decision of continuing or stopping does not depend on the outcomes of the coin. But since we do not know $p$, our decision depends on the previous outcomes, for example, we might find out that it was not logical to play the game at all !

The problem is similar to other exploration-exploitation problems but I could not find anything related to my problem. I tried estimating $p$ and then tried to find a threshold for stopping the game but could not succeed.

I would appreciate any kind of comment, suggestion or references.

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2018-09-05 16:03:06

Your best guess of the value of $p$ after $t$ tosses is $$\hat{t}=\frac{1+\frac1t\sum_1^tX_i}2$$ You should play as long as $\hat{t}>0$, stop if $\hat{t}<0$, and flip a fair coin if $\hat{t}=0$ ;-)

IOW, you should play as long as you are winning.

**Bumbble Comm** · Answer 2 · 2018-09-05 16:12:44

I don't think the question can be answered without some guess at the distribution of $p$ and a limit on the number of games you can play. If we assume $p$ is chosen uniformly on $[0,1]$ there is $\frac 12$ chance the game is favorable and you should play forever, making the expectation infinite. I think it becomes an interesting problem if you put a limit on the number of games. Let's say the limit is $1,000,000$ plays.

Clearly you should start playing. You can win a lot if $p$ is large and lose just a little if you lose so many of the early tosses you decide $p \lt \frac 12$. The strategy will be a set of pairs $(W,L)$ that say if you have won $W$ and lost $L$ you should stop playing.

We can approximate the value of the game. If $p \gt \frac 12$ and you figure it out you want to play all the allowed tosses. You won't be wrong very often, so the value is about $250,000$ less the cost of figuring out that $p \lt \frac 12$, which will be small by comparison.

We can start by finding how many initial losses should convince you to stop. If you lose the first toss your estimate of the distribution of $p$ is $2(1-x)$ There is still a substantial chance that $p$ is quite high and you will make a lot of money, you were just unlucky the first time. If you lose the first $L$ tosses your estimate of the distribution is $(L+1)(1-x)^L$.

I am going to make two simplifying assumptions here. First is that the value of $L$ we want is very small compared to $1,000,000$. Second is that if $p \gt \frac 12$ and you win the $(L+1)$st toss you are not so unlucky that you give up. Your expected win is then $(999,999-L)p-(L-1)$. $L$ will be large enough that it is almost sure you will lose the $(L+1)$st toss. We want to use our estimate of the distribution of $p$ to equate these, ignoring a couple of $1$s. $$\int_{\frac 12}^1((10^6-L)(L+1)(1-x)^L-L)x\;dx=1$$ where the factor $x$ comes from the requirement that we win the $(L+1)st$ toss. Alpha tells me the integral is $$2^{-L-1}(10^6-L)-\frac {3L}8$$ so you shouldn't quit until you lose the first $17$ in a row. I haven't figured how many you should be willing to lose after winning one before you quit.

If you have played a while and have $W$ wins and $L$ losses your estimate of the distribution of $p$ is $\frac {\Gamma(L+W+2)}{\Gamma(L+1)\Gamma(W+1)}x^W(1-x)^L$. I think a reasonable heuristic is to quit when the integral of this over $(\frac 12,1)$ is less than a number like one over a hundred thousand or quarter million.

Optimal stopping time for coin toss with unkown bias

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in STOPPING-TIMES

Related Questions in OPTIMAL-CONTROL

Trending Questions

Popular # Hahtags

Popular Questions