Why does the Hamilton-Jacobi-Bellman equation lead to an optimal control law?

239 Views Asked by At

I am an electrical engineer, and currently I am reading some literature on control engineering.

I read the following assertion, which is presented without any proof:

Let $\dot x = f(x,u)$ be an autonomous control system with smooth $f:U \times \mathbb{R}^m \rightarrow \mathbb{R}^n$ where $x \in U \subseteq \mathbb{R}^n$, and $U$ open. Suppose moreover that for each $u(\cdot) : [0, \infty) \rightarrow \mathbb{R}^m$ continuous, the solution $x$ to the system $ \dot x(t) = f(x(t),u(t))$ exists uniquely on $[0, \infty)$.

Suppose $l \geq 0$ is a smooth function on $U \times \mathbb{R}^m$, and suppose moreover the value function $\pi:U \rightarrow \mathbb{R}$ with respect to the cost $\int _0^\infty l(x,u) dt$ is smooth.

Then,

  • $ \min_{u \in \mathbb{R}^m }[ {\nabla \pi (x) \cdot f(x,u) + l(x,u)}]= 0$ for each $x \in U$ provided that the left-hand-side minimum exists.

  • If $K:U \rightarrow \mathbb{R}^m$ is a smooth map such that: for each $x \in U$, $K(x)$ is the minimizer of $$\min_{u \in \mathbb{R}^m }[ {\nabla \pi (x) \cdot f(x,u) + l(x,u)}]$$ then for each $x_0 \in U$, $$\pi(x_0) = \int _0 ^ \infty l(x(t),\pi(x(t))) dt$$ where $x$ is the unique solution of $\dot x = f(x, K(x))$ with $x(0) = x_0$ (assuming such solutions uniquely exist for all time).

I want to find a proof of the above horrible-looking theorem. I have some background in mathematics: I know how to use measure theory, Lebesgue integration and functional analysis, I fully understand the standard basic analysis materials.

Can anyone provide me some references? If providing a reference is not possible, I will be very happy if someone directly provides me a proof!

1

There are 1 best solutions below

2
On BEST ANSWER

This is a standard result in the theory of optimal control and dynamic programming. It is often called a verification theorem.

Most modern texts on optimal control should cover (possibly slight variations of) the result you state. For example, see Fleming and Soner (2006), chapter $1$. Theorems $5.1$, $6.1$, and $7.1$ might be of particular interest to you.

As an aside, given the infinite time horizon, I believe you need extra regularity conditions for the result you state to be true. Fleming and Soner should help you figure out the details, if necessary.