Dynamic programming and Bellman optimality principle

239 Views Asked by Bumbble Comm At 23 Feb 2026 - 5:57

Consider $$V(x)=\inf_{u \in \mathcal{U}}\int_{0}^{+\infty} e^{-\lambda t} f^0(y_x(t), u(t)) d t$$ which is the value function of an optimal control problem with $y_x(t)$ solution of the state equation \begin{equation}\label{state_eq_finite_dim} \left\{\begin{array}{l} y^{\prime}(t)=f(y(t), u(t)), t>0 \\ y(0)=x \in \mathbb{R}^n \end{array}\right. \end{equation} Then can I say that at an intuitive level $V$ should satisfy the following HJB \begin{equation*} \lambda v(x)-H(x, \nabla v(x))=0 \quad \text { in } \mathbb{R}^{n} \end{equation*} where \begin{equation} H(x, p)=\inf _{u \in U}\{f(x, u) \cdot p+f^0(x,u )\} \end{equation} By Bellman's optimality principle, we know that for every $t>0$ it holds that: \begin{equation} V(x)=\inf _{u \in \mathcal{U}}\left\{\int_{0}^{t} f^0\left(y_{x}(s), u(s)\right) e^{-\lambda s} d s+V\left(y_{x}(t)\right) e^{-\lambda t}\right\} \end{equation} since the lhs of Bellman's optimality principle is independent of $t$ while the rhs depends on $t$ and then we can differentiate formally the rhs and equating it to zero ?(in this way you get the HJB)

Original Q&A

There are 1 best solutions below

Bumbble Comm On 28 Jul 2021 - 6:58 BEST ANSWER

Too long to be a comment:

According to the book you mentioned in page 12, the last expression you wrote $h(t)=\int_{0}^{t} f^0\left(y_{x}(s), u(s)\right) e^{-\lambda s} d s+V\left(y_{x}(t)\right) e^{-\lambda t}$ in general will depend on $t$. However, this quantity should be constant regardless of $t$ for the optimal trajectory due to the dynamic programming principle. This is the equivalent to say that if the fastest route from L.A. to Boston passes through Chicago, then it is also the sequence of the fastest route from LA to Chicago and the fastest from Chicago to Boston. And the same goes for any other midpoint in the optimal route from L.A. to Boston. In your case, this roughly means that the for optimal trajectories the functional remains constant regardless of the midpoint $t$: the first term in $h(t)$ i.e. $\int_{0}^{t} f^0\left(y_{x}(s), u(s)\right) e^{-\lambda s} d s$ which is the cost up to $t$ and the second term i.e. $V\left(y_{x}(t)\right) e^{-\lambda t}$ which is the cost to go, are both optimal for any $t$. Thus, $h'(t)=0$ (since its constant) for the optimal route. Then, the nexts steps from the reference you gave lead to the HJB. Is this the part you wanted to be clarified? Please let me now.

Dynamic programming and Bellman optimality principle

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in CONTROL-THEORY

Related Questions in OPTIMAL-CONTROL

Related Questions in DYNAMIC-PROGRAMMING

Related Questions in VISCOSITY-SOLUTIONS

Trending Questions

Popular # Hahtags

Popular Questions