Help with an optimal control problem

767 Views Asked by At

Given a linear time-invariant system:

$$ \dot{x}(t)=Ax(t)+Bu(t) $$

with initial state $ x(0)=x_0 $ and final state $ x(T)=x_T $.

The performance measure to be minimized is:

$$ \int_{0}^{T} ((x_T-x(t))^T(x_T-x(t))+\rho u(t)^Tu(t) dt $$ with $ \rho=100 $

$(x_T-x(t)) $ is the difference between the state of the system at time $ t $ and the final state. To compute an optimal control $ u^* $ that induces a transition from the initial state $x_0 $ to the target state $x_T$, it is necessary to define the Hamiltonian as:

$$ H(p,x,u,t)=(x_T-x)^T(x_T-x)+\rho u^Tu+p^T(Ax+Bu) $$

from there the next steps are unclear to me. I know I have to apply the Pontryagin minimum principle somehow, but I'm lost and seek some help finding the solution (if possible with rather more detailed explanations than less detailed).


It always helps me to understand a concrete example (same performance measure). So let

$$ A=\begin{bmatrix} -1 & 0.5 \\ 0.3 & -1 \end{bmatrix}, B=\begin{bmatrix} 1\\ 1 \end{bmatrix} $$ and let the initial state be $ x_0=\begin{bmatrix} 1 \\ 0 \end{bmatrix} $ and the final state be $ x_T=\begin{bmatrix} 0 \\ 1 \end{bmatrix} $. Since the eigenvalues of $ A $ are both negative, the system is stable and we control both variables so the system is controllable.

What would be the optimal control trajectories in this specific case?

1

There are 1 best solutions below

5
On BEST ANSWER

The general optimal control problem that Pontryagin minimum principle can solve is of the following form

$$ \min \int_0^T g(t, x(t), u(t))\,dt + g_T(x(T)) \tag{1} $$

with

$$ \dot{x} = f(t, x(t), u(t)), \quad x(0) = x_0. \tag{2} $$

It is sometimes also called the Pontryagin maximum principle. The naming convention I believe depending on if you prefer to minimize cost or maximize profit. But in both cases the acronym would be PMP. But back to the solution. The PMP states that this problem can be solved with

$$ H(t, x(t), u(t), \lambda(t)) = \lambda(t)^\top f(t, x(t), u(t)) + g(t, x(t), u(t)) \tag{3} $$

where $\lambda(t)$ is called the co-state and has the same dimension as $x(t)$. The dynamics of the system to which the optimal control law is applied to can be expressed as using

$$ \dot{\lambda}(t) = -\left[\frac{\partial}{\partial\,x} H(t, x(t), u(t), \lambda(t))\right]^\top \tag{4} $$

$$ \dot{x}(t) = \left[\frac{\partial}{\partial\,\lambda} H(t, x(t), u(t), \lambda(t))\right]^\top \tag{5} $$

$$ 0 = \frac{\partial}{\partial\,u} H(t, x(t), u(t), \lambda(t)). \tag{6} $$

It can be noted that after substitution of $(3)$ into $(5)$ you get $(2)$ back. If some (or all) of the components of the state are constraint at the terminal time, $x_i(T) = c_i$, then the variables $\lambda_i(T)$ can be chosen freely. But if $x_j(T)$ is free, then $\lambda_j(T)$ will be constraint, which is defined by

$$ \lambda_j(T) = \left[\frac{\partial\,g_T(x)}{\partial\,x}\right]^\top_{x = x(T)}. \tag{7} $$


In your case $g_T(x(T)) = 0$, all components of the terminal states are constraint $x(T) = x_T$ (so both $\lambda(0)$ and $\lambda(T)$ are unknown), and $g(t,x(t),u(t))$ and $f(t,x(t),u(t))$ are defined as

$$ g(t,x(t),u(t)) = (x_T - x(t))^\top (x_T - x(t)) + \rho\,u(t)^\top u(t) \tag{8} $$

$$ f(t,x(t),u(t)) = A\,x(t) + B\,u(t). \tag{9} $$

I will be omitting the time dependency from now on, so $x$ instead of $x(t)$. Substituting $(8)$ and $(9)$ this into $(3)$ indeed gives

$$ H(t, x, u, \lambda) = \lambda^\top \left(A\,x + B\,u\right) + (x_T - x)^\top (x_T - x) + \rho\,u^\top u. \tag{10} $$

Substituting $(10)$ into $(4)$ and $(6)$ gives

$$ \dot{\lambda} = -A^\top \lambda - 2\,(x - x_T) \tag{11} $$

$$ 0 = \lambda^\top B + 2\,\rho\,u^\top. \tag{12} $$

Solving $(12)$ gives the optimal control as a function of the co-state

$$ u = -\frac{1}{2\,\rho}B^\top \lambda. \tag{13} $$

By substituting $(13)$ and $(9)$ into $(2)$ and combining it with $(11)$ then the dynamics of the whole system can be written as

$$ \underbrace{ \begin{bmatrix} \dot{x} \\ \dot{\lambda} \end{bmatrix} }_\dot{z} = \underbrace{ \begin{bmatrix} A & -\frac{1}{2\,\rho}B\,B^\top \\ -2\,I & -A^\top \end{bmatrix} }_{\hat{A}} \underbrace{ \begin{bmatrix} x \\ \lambda \end{bmatrix} }_z + \begin{bmatrix} 0 \\ 2\,I \end{bmatrix} x_T. \tag{14} $$

The "steady state" of this system can be used to write it into a form for which the solution to its differential equation is easy the solve, and can be found by setting the time derivative to zero and then solve for the full state

$$ z_{ss} = -\hat{A}^{-1} \begin{bmatrix} 0 \\ 2\,I \end{bmatrix} x_T. \tag{15} $$

By defining new coordinates as $\hat{z} = z - z_{ss}$ then its dynamics can be written as

$$ \dot{\hat{z}} = \hat{A}\,\hat{z} \tag{16} $$

whose solution is simply

$$ \hat{z}(t) = e^{\hat{A}\,t} \hat{z}(0) \tag{17} $$

and therefore by using the definition of $\hat{z}$ using $(15)$ the solution to $(14)$ can be written as

$$ \begin{bmatrix} x(t) \\ \lambda(t) \end{bmatrix} = e^{\hat{A}\,t} \left( \begin{bmatrix} x(0) \\ \lambda(0) \end{bmatrix} + \hat{A}^{-1} \begin{bmatrix} 0 \\ 2\,I \end{bmatrix} x_T \right) - \hat{A}^{-1} \begin{bmatrix} 0 \\ 2\,I \end{bmatrix} x_T. \tag{18} $$

In this case when you have an analytical solution it might be possible to find $\lambda(0)$ explicitly. For instance if the dynamics would have been non-linear then this might not be the case and you might need to resort the something like the shooting method in order to find the $\lambda(0)$ that would result in $x(T) = x_T$. In this case an explicit solution for $\lambda(0)$ (and $\lambda(T)$) exists and can be found by evaluating $(18)$ at $t = T$ and substituting in the boundary values for $x$

$$ \begin{bmatrix} x_T \\ \lambda(T) \end{bmatrix} = e^{\hat{A}\,T} \begin{bmatrix} x_0 \\ \lambda(0) \end{bmatrix} + \left(e^{\hat{A}\,T} - I\right) \hat{A}^{-1} \begin{bmatrix} 0 \\ 2\,I \end{bmatrix} x_T. \tag{19} $$

Now by defining $e^{\hat{A}\,T}$ as a matrix constructed out of four sub-matrices

$$ \begin{bmatrix} M_{11} & M_{12} \\ M_{21} & M_{22} \end{bmatrix} = e^{\hat{A}\,T} \tag{20} $$

it is possible to rewrite $(19)$ and solve for $\lambda(0)$ and $\lambda(T)$

$$ \begin{bmatrix} \lambda(0) \\ \lambda(T) \end{bmatrix} = \begin{bmatrix} -M_{12} & 0 \\ -M_{22} & I \end{bmatrix}^{-1} \left( \begin{bmatrix} M_{11} & -I \\ M_{21} & 0 \end{bmatrix} + \left(e^{\hat{A}\,T} - I\right) \hat{A}^{-1} \begin{bmatrix} 0 & 0 \\ 0 & 2\,I \end{bmatrix} \right) \begin{bmatrix} x_0 \\ x_T \end{bmatrix}. \tag{21} $$


For instance if we apply this to your example using $T=1$, then equation $(21)$ yields that $\lambda_1(0) = 7.6584\cdot 10^{4}$ and $\lambda_2(0) = -6.5076\cdot 10^4$ and the resulting trajectories as a function of time, which can be obtained by substituting in the initial condition for $x$ and $\lambda$ into $(18)$ and can be seen below.

Resulting trajectories

However due to limited numerical accuracy it might for your example actually be better to write the optimal control as a time varying feedback controller. So at each time instance solve the optimal problem but using $T-t$ and $x(t)$ instead of $T$ and $x_0$ respectively. These can then be used instead to solve $(21)$ for $\lambda(t)$, which enables you find the optimal control input at time $t$ using $(13)$. For instance if I just solve for $\lambda(0)$ and then plug that into $(18)$ then for $T=1$ the final error is $\|x(T) - x_T\| = 0.0153$, but if I use $T=5$ instead I get $\|x(T) - x_T\| = 1.8754$. But when I use the time varying feedback controller for $T=1$ I get $\|x(T) - x_T\| = 7.4196\cdot 10^{-5}$ and for for $T=5$ I get $\|x(T) - x_T\| = 2.7624\cdot 10^{-5}$. Solving for the time varying feedback controller is more computationally expensive, because you have to basically calculate the optimal trajectory every time instance but also because you no longer can use the solution from $(18)$. However it is much more numerically stable.