Large scaled finite-horizon discrete-time LQR

127 Views Asked by At

Background

The standard finite-horizon discrete-time LQR is to minimize the quadratic cost below: \begin{align} \min_{\mathbf{u}_{1}, \mathbf{u}_{2}, \ldots, \mathbf{u}_{T - 1}} J & = \mathbf{x}_{T}^{\top} \mathbf{Q}_{T} \mathbf{x}_{T} + \sum_{t=0}^{T - 1} \mathbf{x}_{t}^{\top} \mathbf{Q}_{t} \mathbf{x}_{t} + \mathbf{u}_{t}^{\top} \mathbf{R}_{t} \mathbf{u}_{t}\\ & \text{s.t.: } \mathbf{x}_{t + 1} = \mathbf{A}_{t} \mathbf{x}_{t} + \mathbf{B}_{t} \mathbf{u}_{t}, \end{align} where: $\mathbf{x}_{t} \in \mathbb{R}^{n}, \mathbf{u}_{t} \in \mathbb{R}^{m}, \mathbf{A}_{t}, \mathbf{Q}_{t} \in \mathbb{R}^{n \times n}, \mathbf{B}_{t} \in \mathbb{R}^{n \times m}, \mathbf{R}_{t} \in \mathbb{R}^{m \times m}$. Also, the matrices $\mathbf{Q}_{t}$ and $\mathbf{R}_{t}$ are positive-definite.

The optimal controller is linear and given as: $$\mathbf{u}_{t} = -\mathbf{F}_{t} \mathbf{x}_{t},$$ where: $$\mathbf{F}_{t} = (\mathbf{R}_{t} + \mathbf{B}^{\top}_{t} \mathbf{P}_{t + 1} \mathbf{B}_{t})^{-1} \mathbf{B}^{\top}_{t} \mathbf{P}_{t + 1} \mathbf{A}_{t},$$ and $\mathbf{P}_{t}$ is found iteratively backwards in time by the dynamic Riccati equation: \begin{align} \mathbf{P}_{t} & = \mathbf{A}_{t}^{\top} \mathbf{P}_{t + 1} \mathbf{A}_{t} - \mathbf{A}^{\top}_{t} \mathbf{P}_{t + 1} \mathbf{B}_{t} (\mathbf{R}_{t} + \mathbf{B}^{\top}_{t} \mathbf{P}_{t + 1} \mathbf{B}_{t})^{-1} \mathbf{B}^{\top}_{t} \mathbf{P}_{t + 1} \mathbf{A}_{t} + \mathbf{Q}_{t}\\ & = \mathbf{A}_{t}^{\top} \mathbf{P}_{t + 1} (\mathbf{A}_{t} - \mathbf{B}_{t} \mathbf{F}_{t}) + \mathbf{Q}_{t}. \end{align}

Problem

Given the solution, one can calculate the auxiliary variable $\mathbf{P}_{t}$, and then obtain the optimal controller gain $\mathbf{F}_{t - 1}$ backward in time. However, the problem I am working with has a state with very high dimensions, $n > 10^{6}$, while the dimension $m$ of the action is much smaller ($m = 10$). In addition, $\mathbf{A}_{t}, \mathbf{Q}_{t}$ and $\mathbf{R}_{t}$ are diagonal matrices. Strictly following the provided solution consistently results in "out of memory" issue due to the storage of the auxiliary matrix $P_{t}$ which requires $\mathcal{O}(n^{2})$.

I searched for some large-scaled LQR problems, but most of them are about the infinite-horizon, where they approximately solved the algebraic Riccati equation, not the difference Riccati equation.

I wonder if there is a way to handle this large-scaled problem, since at the end, what we need is the controller gain $\mathbf{F} \in \mathbb{R}^{m \times n}$, which is reasonable to store in the memory.


Update on Oct 13, 2021

After searching, I found a way to directly solve for action $\mathbf{u}_{t}$ without the auxiliary variable $\mathbf{P}_{t}$. It is to re-parameterize the action as a linear function of the initial state (see MIT 6.832 and Caltech CS159). This results in a quadratic optimization for the cost $J$.