I'm confident that there is a well-known solution to this problem, but I am having trouble finding a reference for it. I am also quite rusty on these kinds of problems, so I am having trouble solving it. Any help would be very much appreciated!
Consider the following bandit-like problem:
Time $t$ takes values in $[0,\infty)$. Suppose a "reward" $Y\in\{L,H\}$ is randomly drawn according to $\mathbb{P}\{Y=H\} = p$. Given $Y=y$, a Poisson process arrives at rate $\lambda_y>0$ ($\forall y\in\{L,H\}$).
A decision-maker (henceforth "DM") does not observe $Y$ initially. Given no arrivals, the DM can choose to stop the process and earn a payoff of $u_{_M}\in\mathbb{R}$. At the moment of an arrival, the process automatically stops and the DM earns a payoff of $u_{y}\in\mathbb{R}$ ($y\in\{L,H\}$). Assume that $u_{_L}<u_{_M}<u_{_H}$. What is the optimal time to stop the process?
Thanks again and apologies for the loose description. Please let me know if anything needs to be clarified!
This expands my comment from yesterday: Consider the policy of fixing a (deterministic) time $x \in [0,\infty)$ and stopping at time $x$ if no arrival occurs before then. Let $g(x)$ be the expected reward under this policy. It is not difficult to write an exact expression for $g(x)$ in terms of $p$, $\lambda_H, \lambda_L, u_M, u_H, u_L$. Clearly $$u_L \leq g(x) \leq u_H \quad \forall x \in [0, \infty) $$ Now maximize $g(x)$ over $x \in [0, \infty]$ by considering the critical points. In particular compare $g(0)$, $g(\infty)$, and $g(c)$ for a point $c \in (0, \infty)$ such that $g'(c)=0$. Clearly \begin{align} g(0) &=u_M\\ g(\infty) &= pu_H + (1-p)u_L \end{align} so it remains only to find $c$ and $g(c)$.
Are deterministic policies optimal? Yes. Let $X$ be any nonnegative random variable chosen independently of the system. Suppose we decide to stop at time $X$. Our reward is $E[g(X)]$, which we know is finite (it is in $[u_L, u_H]$).
Claim: There must be a deterministic $x \in [0, \infty)$ such that $g(x)\geq E[g(X)]$.
Proof: Suppose not (we reach a contradiction). Define $m=E[g(X)]$. Then $g(x)<m$ for all $x \in [0, \infty)$. Since $X \in [0, \infty)$ surely, we surely have $g(X) < m$. It follows that $m-g(X)$ is a positive random variable, that is $m-g(X)>0$, and so $E[m-g(X)]>0$. This implies $m-m>0$, a contradiction. $\Box$