Do not make an exploration if the good outcome is still sufficiently bad

19 Views Asked by Bumbble Comm At 27 Mar 2026 - 12:14

Imagine a two-stage dynamic problem with information update. At stage $0$, we are in the state of $d$ success out of $n$ trials. We need to decide we try or not now. If we try, we will go to two possible "states", $d+1$ success out of $n+1$ trails or $d$ success out of $n+1$ trails. At each state, no matter each action we take, we have to pay a cost. At state $(n,d)$, if we take action $0$, we have to pay $$f(n, d) = \max_{p\in [0, 1]} \sum_{i=0}^{d} {n \choose i} p^i (1-p)^{n-i} \cdot (p-p_0);$$ if we take action $1$, we have to pay $$g(n, d) = \max_{p\in {0, 1}} \sum_{i=d}^{n} {n \choose i} p^i (1-p)^{n-i} \cdot (p_0-p).$$ Now, I want to prove that at stage $0$, it is optimal to take action $0$ if we know that the optimal action of state $(n+1, d+1)$ is $0$ at stage $1$. That is, we want to prove: $$ \max_{p\in [0,1]} \left\{\sum_{i=d}^{n} {n \choose i} p^i (1-p)^{n-i} \cdot (p_0-p) + p\cdot f(n+1, d+1) + (1-p)\cdot f(n+1, d)\right\} > 2\cdot f(n, d)$$ Under the condition that
$$f(n+1, d+1) < g(n+1, d+1).$$

I would like to note that $p_0$ can be any fix value in $[0,1]$.

I would really appreciate if any of you can provide any idea or discussion on it.

Original Q&A

Do not make an exploration if the good outcome is still sufficiently bad

Related Questions in PROBABILITY

Related Questions in BINOMIAL-DISTRIBUTION

Related Questions in DYNAMIC-PROGRAMMING

Trending Questions

Popular # Hahtags

Popular Questions