Bellman's Principle of Optimality

329 Views Asked by At

I'm currently reading Pham's Continuous-time Stochastic Control and Optimization with Financial Applications however I'm slightly confused with the way the Dynamic Programming Principle is presented.

In particular, the Theorem is stated in terms of an optimal control and stopping time. I'm familiar with the analysis done when just searching for an optimal control but not with the stopping time. From Pham

Theorem (Dynamic Programming Principle)

Let $x\in \mathbb{R}^n$ Then we have \begin{equation} v(x)=\sup_{\alpha \;\in \mathcal{A}(x)}\sup_{\theta\;\in \mathcal{T}}E\bigg[\int_0^\theta e^{-\beta s}f(X_s^x, \alpha_s)ds+e^{-\beta \theta}v(X^x_\theta)\bigg] \end{equation} \begin{equation} \;\;\;\;\;\;\;=\sup_{\alpha \;\in \mathcal{A}(x)}\inf_{\theta\;\in \mathcal{T}}E\bigg[\int_0^\theta e^{-\beta s}f(X_s^x, \alpha_s)ds+e^{-\beta \theta}v(X^x_\theta)\bigg] \end{equation}

where $\alpha$ is the control process and $\mathcal{A}(x)$ is the set of admissible processes, $\mathcal{T}$ is the set of stopping times, $X^x_s$ represents the stochastic process starting at $x$ at time $s$. The utility or reward function is $f$ and the value function $v$. By convention $e^{-\beta\theta}=0$ when $\theta=\infty$.

I'm unsure of the intuition of why these should be equal. I'm happy with the interpretation of the DPP. That is, the optimization problem can be split in two parts: an optimal control on the whole time interval $[t, T]$ may be obtained by first searching for an optimal control from time $\theta$ given the state value and then maximising over controls the value

\begin{equation} E\bigg[\int_0^\theta e^{-\beta s}f(X_s^x, \alpha_s)ds+e^{-\beta \theta}v(X^x_\theta)\bigg] \end{equation}

What I'm unsure about is why the $\sup_{\theta \in \mathcal{T}}$ becomes $\inf_{\theta \in \mathcal{T}}$ in the equivalent definitions. When considering just the control problem I understand the intuition but not with the stopping times.

Any help would be greatly appreciated.