Recovering minimum norm control from LQR theory

789 Views Asked by At

Recall that for an LTV system,

$$\dot{x}=A\left( t \right)x+B\left( t \right)u,\qquad x\left( {{t}_{0}} \right)={{x}_{0}}$$

provided $(A,B)$ is controllable one control that takes the state from $x_0$ at time $t_0$ to $x_f = 0$ at some time $t_f$ is given by,

$$ u\left( t \right)=-B^T\left( t \right)\phi^T \left( {{t}_{0}},t \right){{W}^{-1}}\left( {{t}_{0}},{{t}_{f}} \right){{x}_{0}} $$

where $\phi$ is the state transition matrix and $W$ is the controllability gramian. It can be shown that this $u$ is a minimum norm control in the sense that of all the possible controls this is the one that minimizes the $L_2$ norm. I understand that this is an open-loop specification. I believe Brockett, or Luenberger has a proof for this.

On the other hand if we consider the LQR problem with associated cost functional, $$ J\left( u \right)=\int\limits_{{{t}_{0}}}^{{{t}_{f}}}{{{x}^{\top}}Q\,x+{{u}^{\top}}R\,u}\,dt+{{x}^{\top}}\!\left({{t}_{f}} \right)M\,x(t_f) $$ we get the standard feedback solution $$ u=-{{R}^{-1}}{{B}^{\top}}P\,x $$ in terms of the matrix Riccati differential equation: $$ -\dot{P}=Q+P\,A+{{A}^{\top}}P-P\,B\,{{R}^{-1}}{{B}^{\top}}P $$ with the appropriate boundary condition.

So my question is this: Are the two problems compatible? In other words is there a choice of $Q,R$ that will let me recover the minimum norm control above from the LQR formulation? Since LQR requires stabilizability of $(A,C)$ where $Q={{C}^{\top}}C$, my guess is that simply turning off $Q$ wouldn't do. Also, one solution is in feedback form and the other in open-loop so these problems must be unrelated. But I can't find a proof or any literature on this.

2

There are 2 best solutions below

0
On BEST ANSWER

I finally hunted down Brockett's book and it turns out he does make this connection somewhat explicit, but it is not a simple matter of choosing the gain matrices in the cost functional appropriately. However, the hints are there in Mr. van der Veen's answer.


Gramian method

Here is the essence of the idea, I will leave the proof(s) to be consulted from Brockett. For the system, $$\dot{x}\left( t \right)=A\left( t \right)x\left( t \right)+B\left( t \right)u\left( t \right),\qquad x\left( {{t}_{0}} \right)={{x}_{0}}$$ and define the controllability gramian, $$W\left( {{t}_{0}},{{t}_{1}} \right):=\int\limits_{{{t}_{0}}}^{{{t}_{1}}}{\phi \left( {{t}_{0}},s \right)B\left( s \right)B ^T \left( s \right)\phi ^T \left( {{t}_{0}},s \right)ds}$$ The controllability of the above LTV system depends on the invertibility of the gramian and this result is standard. The gramian satisfies the differential equation, $$\frac{d}{dt}W\left( t,{{t}_{1}} \right)=A\left( t \right)W\left( t,{{t}_{1}} \right)+W\left( t,{{t}_{1}} \right){{A}^{T}}\left( t \right)-B\left( t \right){{B}^{T}}\left( t \right),\qquad W\left( {{t}_{1}},{{t}_{1}} \right)=0$$ and the functional equation $$W\left( {{t}_{0}},{{t}_{1}} \right)=W\left( {{t}_{0}},t \right)+\phi \left( {{t}_{0}},t \right)W\left( t,{{t}_{1}} \right){{\phi }^{T}}\left( {{t}_{0}},t \right)$$ The first fact follows from differentiating $W\left( t,{{t}_{1}} \right)$ with respect to $t$ using the Leibniz rule. The second follows from splitting the integral in the definition of the gramain from $t_0$ to $t$ and $t$ to $t_1$ and using the properties of the state transition matrix.

A control achieving state transfer: $x\left( {{t}_{0}} \right)\to x\left( {{t}_{1}} \right)$ is given by: $$ u\left( t \right)=-{{B}^{T}}{{\phi }^{T}}\left( {{t}_{0}},t \right){{W}^{-1}}\left( {{t}_{0}},{{t}_{1}} \right)\left[ {{x}_{0}}-\phi \left( {{t}_{0}},{{t}_{1}} \right){{x}_{1}} \right] $$ and in particular this control minimizes the quantity:

$$ \int\limits_{{{t}_{0}}}^{{{t}_{1}}}{{{\left\| {{u}_{0}}\left( t \right) \right\|}^{2}}dt} $$

LQR Case 1: Fixed-time free-endpoint problem

Associate with the LTV system the cost functional, $$J\left( u \right)=\int\limits_{{{t}_{0}}}^{{{t}_{1}}}{\left( {{x}^{T}}Qx+{{u}^{T}}u \right)dt}+{{x}^{T}}\left( {{t}_{1}} \right)Mx\left( {{t}_{1}} \right)$$ This is the LQR formulation with $R=I$ as Mr. Van der Veen mentions but this is in particular a fixed-time free-endpoint problem. Its optimal solution rests on the Riccati equation, $$\dot{P}\left( t \right)=-{{A}^{T}}\left( t \right)P\left( t \right)-P\left( t \right)A\left( t \right)+P\left( t \right)B\left( t \right){{B}^{T}}\left( t \right)P\left( t \right)-Q,\qquad P\left( {{t}_{1}} \right)=M$$ and the optimal control law minimizing $J$ is $u=-{{B(t)}^{T}}P(t)x$. Suppose $Q=0$. Then the RDE becomes, $$\dot{P}\left( t \right)=-{{A}^{T}}\left( t \right)P\left( t \right)-P\left( t \right)A\left( t \right)+P\left( t \right)B\left( t \right){{B}^{T}}\left( t \right)P,\qquad P\left( {{t}_{1}} \right)=M$$

Assume $P(t)$ is invertible. Then note that, $$\frac{d}{dt}\left( I=P\left( t \right){{P}^{-1}}\left( t \right) \right)\Rightarrow {{{\dot{P}}}^{-1}}\left( t \right)=-{{P}^{-1}}\left( t \right)\dot{P}\left( t \right){{P}^{-1}}\left( t \right)$$ and using the RDE we have, $${{{\dot{P}}}^{-1}}\left( t \right)={{P}^{-1}}\left( t \right){{A}^{T}}\left( t \right)+A\left( t \right){{P}^{-1}}\left( t \right)-B\left( t \right){{B}^{T}}\left( t \right)$$ This is essentially the same differential equation the gramain satisfies. Then it can be shown: $$P\left( t,{{t}_{1}} \right)={{\left[ W\left( t,{{t}_{1}} \right)+\phi \left( t,{{t}_{1}} \right){{M}^{-1}}{{\phi }^{T}}\left( t,{{t}_{1}} \right) \right]}^{-1}}$$ where $W\left( t,{{t}_{1}} \right)$ satisfies the functional equation mentioned above (it comes from the right hand side of the equation).

LQR Case 2: Fixed-time fixed-endpoint problem

Now let the final state be fixed as in the Gramian method. That is the boundary conditions are $x\left( {{t}_{0}} \right)={{x}_{0}}$ and $x\left( {{t}_{f}} \right)={{x}_{f}}$.

Then we modify the cost functional (we have to since the terminal cost is no longer relevant) so that, $$J\left( u \right)=\int\limits_{{{t}_{0}}}^{{{t}_{1}}}{\left( {{x}^{T}}Qx+{{u}^{T}}u \right)dt}$$ Assume there exists a solution defined for all intervals of interest for the differential equation: $$\dot{P}\left( t \right)=-{{A}^{T}}\left( t \right)P\left( t \right)-P\left( t \right)A\left( t \right)+P\left( t \right)B\left( t \right){{B}^{T}}\left( t \right)P\left( t \right)-Q,\qquad P\left( {{t}_{1}} \right)=M$$ Then a trajectory $x(t)$ minimizing $J$ above exists if and only if there exists a trajectory minimizing $$\bar{J}=\int\limits_{{{t}_{0}}}^{{{t}_{1}}}{{{\left\| v\left( t \right) \right\|}^{2}}dt}$$ for the system $$\dot{x}=\left( A\left( t \right)-B\left( t \right){{B}^{T}}\left( t \right)P\left( t \right) \right)x\left( t \right)+Bv\left( t \right)$$ subject to the same boundary conditions. In particular the Gramian method can be adopted to find a control minimum norm control $v^*(t)$ for the modified system and then the optimal control in terms of the original equation is given by, $${{u}^{*}}\left( t \right)=-{{B}^{T}}P\left( t \right)x\left( t \right)+{{v}^{*}}\left( t \right)$$


Proofs of the optimality of the Gramian method and LQR2 can be found here which basically summarizes the ideas in Brockett and also mentions the last optimal control law utilizing $v^*(t)$ towards the end.

I must comment that neither of the two LQR methods and their relation to the Gramian is particularly satisfying. It is also likely that plugging in everything, setting $Q=0$ and doing the torturous algebra for $u^*(t)$ in LQR2 as in LQR1 might make explicit some more structure, but I will attempt that when I have time to kill.


LQR0 - Infinite horizon

I am including this case for completeness though I was not initially considering the infinite horizon case. In retrospect this should have been the first observation. The relationship between the minimum norm control and LQR formulation is easiest to see here. Consider the LTI system $$ \dot x = A x + Bu $$ with associated cost $$ J(u) = \int \limits _0 ^{\infty} \left( x^T Qx + u^TRu \right) dt $$ Note that for the LTI case the controllability gramian satisfies the equation: $$ AW+W{{A}^{T}}+B{{B}^{T}}=0 $$ (see for example here). Let $Q=0$, then the Algebraic Riccati equation we get in this case is: $$ A^TP + PA + PBR^{-1}B^TP = 0 $$ Multiply on both sides (from the left and right) by $P^{-1}$ to get, $$ A{{P}^{-1}}+{{P}^{-1}}A+B{{R}^{-1}}{{B}^{T}}=0 $$ Let $R=0$ to recover the Lyapunov equation the controllability gramian satisfies. Note that in the infinite horizon case the final state is always the origin. Then the optimal control from the LQR theory (the control law above) and from least squares theory (the control law in the question) coincide in this case with,

$$\begin{align} {{P}^{-1}}&={{\phi }^{T}}\left( {{t}_{0}},t \right){{W}^{-1}}\left( {{t}_{0}},{{t}_{1}} \right) \\ & \Rightarrow P=W\left( {{t}_{0}},{{t}_{1}} \right){{\phi }^{T}}\left( t,{{t}_{0}} \right) \\ \end{align}$$

6
On

I am not really familiar with Gramians, mainly because when I got introduced to them they just seemed a more convoluted way of achieving the same thing as other methods as LQR. So I can't give you a complete answer with proof for every step I use.

When I looked it up I found this document for the controllability Gramian approach. When translating the discrete time solution to continuous time I got this expression

$$ u(t) = -R^{-1}(t)\,B^\top\!(t)\,\Phi^\top\!(t_0,t)\,W_c^{-1}(t_0,t_f)\,x_0 $$

with

$$ W_c(t_0,t_f) = \int_{t_0}^{t_f} \Phi(t_0,t)\,B(t)\,R^{-1}(t)\,B^\top(t)\,\Phi^\top\!(t_0,t)\,dt. $$

This is identical to your expression besides from the $R(t)$ term. When testing this it seemed that the result from this method are identical to the solution of the following problem optimal control problem

$$ \begin{align} \min_u\ & J(u) = \int_{t_0}^{t_f} u^\top\!(t)\,R(t)\,u(t)\,dt \\ \textrm{s.t.}\ & \dot{x}(t) = A(t)\,x(t) + B(t)\,u(t) \\ & x(t_f) = 0 \end{align} $$

which requires $R(t)=R^\top\!(t)\succ0\ \forall\,t\in[t_0,t_f]$. However I do not have the proof that these two problems yields the same solution for $u(t)$.

The optimal control problem with terminal constraints can be approximated by a finite horizon LQR by increasing its terminal cost more and more. So the problem can be formulated as

$$ \begin{align} \min_u\ & J(u) = \int_{t_0}^{t_f} u^\top\!(t)\,R(t)\,u(t)\,dt + x^\top\!(t_f)\,M\,x(t_f) \\ \textrm{s.t.}\ & \dot{x}(t) = A(t)\,x(t) + B(t)\,u(t) \end{align} $$

So LQR using $Q(t)=0\ \forall\,t\in[t_0,t_f]$ and $R(t)$ identical to that of the previous two problems. When choosing a larger and larger $M$, then the LQR will try to bring $x(t_f)$ closer and closer to zero, so in the limit of letting (the eigenvalues of) $M$ go to infinity then this LQR becomes identical to the optimal control problem with terminal constraint.

The solution to the LQR problem can be found using

$$ u(t) = -R^{-1} B^\top\!(t)\,P(t)\,x(t) $$

$$ -\dot{P}(t) = P(t)\,A(t) + A^\top\!(t)\,P(t) - P(t)\,B(t)\,R^{-1}\!(t)\,B^\top\!(t)\,P(t) $$

with $P(t_f) = M$. So in order to find $P(t)$ you would have to simulate it backwards in time starting at $t_f$ and ending at $t_0$.

However it can be noted that even though your Gramian approach seems to give the same solution for $u(t)$ as the optimal control approach, it only gives the optimal trajectory. While the optimal control approach also gives you an optimal policy, which will also be able to attenuate disturbances acting on the system.