Stabilizing controls in linear quadratic regulator

98 Views Asked by At

I am studying a linear quadratic control problem with discounting. For $\gamma \in (0,1)$, $Q \succeq 0$ and $R \succ 0$ and linear dynamics $s_{t+1}=As_t + B a_t$, let the total cost starting in state $s_0$ and using the control $a_t = \theta s_t$ be:

$$ J_{\theta}(s_0) = \sum_{t=0}^\infty \gamma^t \left( s_t^\top Q s_t + a_t^\top R a_t \right) = s_0^\top \left[ \sum_{t=0}^\infty \gamma^t ((A+B\theta)^t)^\top (Q + \theta^\top R \theta) (A+B\theta)^t \right] $$

Assume $\theta$ to be a stabilizing control, i.e. $\rho(A+B\theta) < 1$. I want to show the following:

Let $\theta'$ be another linear control such that $J_{\theta'}(s_0) < J_{\theta}(s_0)$ for all starting states $s_0$. Then, is it true that $\theta'$ is also stabilizing. That is, is $\rho(A+B\theta') < 1$? If so, how does one prove this kind of result? The discount factor $\gamma$ is stumping me.

Edit: What I am studying here is called Policy Iteration in dynamic programming. Also assuming the optimal controller, $\theta^*$ is stable. Policy iteration produces an 'improved policy' in terms of costs from every state but I am not sure if this policy is stable.

1

There are 1 best solutions below

5
On

It is not true that $\theta'$ has to be a stabilizing state feedback controller. This would even not be true without the discounting factor $\gamma^t$.

For example consider

$$ A = \begin{bmatrix} 2 & 0 \\ 0 & 0.5 \end{bmatrix}, \quad B = \begin{bmatrix} 1 \\ 1 \end{bmatrix}, $$

$$ Q = \begin{bmatrix} 0 & 0 \\ 0 & 1 \end{bmatrix}, \quad R = 1, \quad s_0 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}. $$

It can be shown that $\theta =\begin{bmatrix}-2 & 0\end{bmatrix}$ is a stabilizing controller and would yield some positive total cost. However, if you would use $\theta' =\begin{bmatrix}0 & 0\end{bmatrix}$ one would get that the total cost is zero.

Things don't have to change if one would also include the constraint that $(A,Q)$ is detectable. For example consider the following scalar system with $A=2$, $B=1$, $Q=1$, $R=1$ and $\gamma=¼$. Because the system is scalar by using the geometric series the cost function can be simplified to

\begin{align} J_\theta(s_0) &= y_0^2 (Q + \theta^2 R) \sum_{k=0}^\infty \left(\gamma (A + B\,\theta)^2\right)^k, \\ &= \frac{y_0^2 (Q + \theta^2 R)}{1 - \gamma (A + B\,\theta)^2}, \\ &= \frac{y_0^2 (1 + \theta^2)}{1 - ¼(2 + \theta)^2}. \end{align}

In order for the geometric series to converge it should hold that $|¼(2 + \theta)^2|<1$ which implies $-4<\theta<0$. It can be shown that in that interval $J_\theta(s_0)$ is minimized at $\theta^*=¼(1-\sqrt{17})\approx -0.78$. It can be noted that $\rho(A+B\,\theta^*)\approx 1.22 \nless1$ and since $\theta^*$ is the minimizer of $J_\theta(s_0)$ it follows that any stabilizing $\theta$ yields a larger value for $J_\theta(s_0)$.