Optimal Control and Dynamic Programming Principle

81 Views Asked by At

I am looking at the dynamic programming principle in optimal control problems. I am reading a book on the subject. This is the statement of the problem and approach. The books is "Non-Cooperative Stochastic Differential Game Theory of Generalized Markov Jump Linear Systems" Cheng-Ke Zhang et. al.

The dynamics is given by,

$\dot{x} = f(t, x, u), \ x(0) = x_0$

We are trying to optimize a PI.

$\underset{u}{\min} \left[ \int_0^T g(x, x(s), u(s)) ds + q(x, T) \right]$

where $u$ are admissible controls.

The dynamic programming principle (Chapter 2, section 2.1.1 Dynamic Programming).

The dynamic programming principle can be stated as follows (pg. 18).

"A set of controls $u^*(t) = \phi^*(t,x)$ constitutes an optimal control solution to the control problem stated above, if there exists a continuous differentiable functions $V(t,x)$ defined on $[0,T] > \times \mathbb{R}^n \mapsto \mathbb{R}$ and satisfying the Bellman equation,

$-V_t(t,x) = \underset{u}{\min} \left[ g(t, x, y, u) + V_x(t,x) f(t, > x, u) \right]$ $= \left[ g(t, x, y, \phi*(t,x)) + V_x(t,x) f(t, x, > \phi^*(t,x)) \right]$ $V(T,x) = q(x)$"

The proof goes as follows,

$V = \underset{u}{min} \left[ \int_0^T g(s, x(s), u(s)) ds + q(x,T) \right]$, satisfying the boundary condition $V(T, x^*(T)) = q(x^*(T))$, and $\dot{x}^*(s) = f(x, x^*(s), \phi^*(s, x^*(s)), \ x^*(0) = x_0$. Consider another set of strategies $u(s) \in \mathcal{U}_m$, with corresponding trajectories $x(s)$, then from the bellman condition we have,

$g(t, x, u) + V_x(t, x) f(t, x, u) + V_t(t, x) \ge g(t, x^*, u^*) + V_{x^*}(t, x^*) f(t, x^*, u^*) + V_t(t, x^*)$

Now the book claims that integrating by parts produces the following result, $\int_0^T g(s, x(s) ds + V(T, x(T)) - V(t_0, x_0) \ge \int_0^T g(s, x^*(s), u^*(s)) ds + V(T, x^*(T)) - V(t_0, x_0)$

The question is how did we get to this result? Integrating the $V_x$ term we have, $\int_0^T V_x(s, x) f(s, x, u) ds = \int_0^T V_x(s, x) \frac{d x}{d s} ds = \int_0^T V_x(s,x) dx = V(T, x(T)) - V(0, x_0)$

We ge a similar term from the integral $\int_0^T V_s(s, x) ds = V(T, x(T)) - V(0, x_0)$. What am I missing? When you put it all together we end up with,

$\int_0^T g(s, x(s) ds + 2V(T, x(T)) - 2V(t_0, x_0) \ge \int_0^T g(s, x^*(s), u^*(s)) ds + 2 V(T, x^*(T)) - 2 V(t_0, x_0)$

Thank you in advance for all the responses.

1

There are 1 best solutions below

0
On

I believe I figured out the answer. The answer is pretty straight forward. Here it goes.

$g(t, x, u) + V_x(t,x) f(t,x,u) + V_t(t,x) = g(t,x,u) + V_x(t,x) \dot{x} + V_t(t,x) = g(t,x,u) + dV(t,x)$

where

$dV(t,x) = V_x \dot{x} + V_t$