How to carry out the expected value of the cost function in a LQG problem to tackle path tracking?

166 Views Asked by At

I have a system, whose state is defined by $x_t$. The transition state mapping (STP) for the systems is defined as:

$$x_{t+1} = A_t x_t + B_t u_t + w_t$$

where, $x \in \mathbb{R}^{n \times 1}$, $A_t \in \mathbb{R}^{n \times n}$, $B_t \in \mathbb{R}^{n \times m}$, $u \in \mathbb{R}^{m \times 1}$, and $w \in \mathbb{R}^{n \times 1} \sim \mathcal{N}(0, \Sigma_w)$.

I want to drive my system to a given state $\bar{x}_t$ for all $t = 0, 1, ..., T$. The linear-quadratic-gaussian (LQG) control problem for this task is expressed as:

$$\min\limits_{\pi \in \Pi} \mathbb{E}\big[ (x^\pi_T - \bar{x}_T)^\top Q_T (x^\pi_T - \bar{x}_T) + \sum_{t=0}^{T-1} (x^\pi_t - \bar{x}_t)^\top Q_t (x^\pi_t - \bar{x}_t) + (u_t^\pi)^\top R_t u_t^\pi \big]$$

where $\pi$ is a given policy, $Q_t$ is a semi-positive definite matrix, and $R_t$ is a positive-definite matrix. Both these matrices represent costs, therefore, we can say the optimization problem tries to minimize the expected the deviations from the reference trajectory.

I want to use the dynamic programming approach to provide an explicit characterization of the Markovian optimal policy and the optimal value functions for this problem.

First, I characterized the stage costs and the final cost. The final cost at time $t = T$ is $(x^\pi_T - \bar{x}_T)^\top Q_T (x^\pi_T - \bar{x}_T)$ while the stage cost is expressed as $(x^\pi_t - \bar{x}_t)^\top Q_t (x^\pi_t - \bar{x}_t) + (u_t^\pi)^\top R_t u_t^\pi$.

With this information, I can define the optimal value functions, $V^*(x)$ at any time $t$.

\begin{align} V^*_T(x) &= (x^\pi_T - \bar{x}_T)^\top Q_T (x^\pi_T - \bar{x}_T) \\ V^*_t(x) &= \min_{u_t} \ \Big[(x^\pi_t - \bar{x}_t)^\top Q_t (x^\pi_t - \bar{x}_t) + (u_t^\pi)^\top R_t u_t^\pi + \mathbb{E}[V^*_{t+1}(A_t x_t + B_t u_t + w_t)]\Big] \end{align}

For time $t = T$, the result is defined above. For $t = T - 1$, we have:

\begin{align} V^*_{T-1}(x) &= \min_{u_t} \ \Big[(x^\pi_{T-1} - \bar{x}_{T-1})^\top Q_{T-1} (x^\pi_{T-1} - \bar{x}_{T-1}) + (u_{T-1}^\pi)^\top R_{T-1} u_{T-1}^\pi + \mathbb{E}[V^*_{T}(A_{T-1} x_{T-1} + B_{T-1} u_{T-1} + w_{T-1})]\Big] \end{align}

Given the final cost has been defined by $V^*_T$, the expectation term inside the optimal value function for $t = T -1$ can be defined as:

\begin{align} \mathbb{E}\Big[(A_{T-1} x_{T-1} + B_{T-1} u_{T-1} + w_{T-1} - \bar{x}_{T-1})^\top Q_T (A_{T-1} x_{T-1} + B_{T-1} u_{T-1} + w_{T-1} - \bar{x}_{T-1}) \Big] \end{align}

And this is the first obstacle I am facing. I don't know how to proceed from here.

I don't think that $((A_{T-1} x_{T-1})^\top Q_{T-1} (A_{T-1} x_{T-1}) + (A_{T-1} x_{T-1})^\top Q_{T-1} (B_{T-1} u_{T-1}) + ...)$ would be the answer.

How should this series of matrix multiplications be carried out? Given the expected value of $w$ is zero, this should help alleviate the computation but still I don't know how to proceed. I'll appreciate if someone could point out the next step or provide a reference that could be helpful. Thank you.

1

There are 1 best solutions below

3
On BEST ANSWER

Yes, you can expand the product the same way you would expand it with real numbers but by paying attention to the fact that multiplication is not commutative in this case.

If you want to simplify, you can rewrite

\begin{align} A_{T-1} x_{T-1} + B_{T-1} u_{T-1} + w_{T-1} - \bar{x}_{T-1} \end{align}

as $M_{T-1}z_{T-1}$ where

\begin{equation} M_{T-1}:=\begin{bmatrix} A_{T-1} & B_{T-1} & I & -I \end{bmatrix}z_{T-1} \end{equation} and \begin{equation} z_{T-1}:=\begin{bmatrix} x_{T-1}\\ u_{T-1}\\ w_{T-1}\\ \bar{x}_{T-1} \end{bmatrix}. \end{equation}

Then, you will get

\begin{align} \mathbb{E}\left[z_{T-1}^TM_{T-1}^TQ_{T}M_{T-1}z_{T-1}\right]=\mathbb{E}\left[z_{T-1}^T \begin{bmatrix} A_{T-1}^TQ_TA_{T-1} & A_{T-1}^TQ_TB_{T-1} & A_{T-1}^TQ_T & -A_{T-1}^TQ_T\\ B_{T-1}^TQ_TA_{T-1} & B_{T-1}^TQ_TB_{T-1} & B_{T-1}^TQ_T & -B_{T-1}^TQ_T\\ Q_TA_{T-1} & Q_TB_{T-1} & Q_T & -Q_T\\ -Q_TA_{T-1} & -Q_TB_{T-1} & -Q_T & Q_T \end{bmatrix} z_{T-1}\right] \end{align}

All the terms depending linearly in $w$ will go away since the noise has zero mean and is independent of the other signals and you will be left with a quadratic term in $w$ of the form $\mathbb{E}[w^TQ_Tw]$ which is equal to the trace of $Q_T\Sigma_w$. In the end, you will get

\begin{align} \mathbb{E}\left[z_{T-1}^TM_{T-1}^TQ_{T}M_{T-1}z_{T-1}\right]=\mathbb{E}\left[z_{T-1}^T \begin{bmatrix} A_{T-1}^TQ_TA_{T-1} & A_{T-1}^TQ_TB_{T-1} & 0 & -A_{T-1}^TQ_T\\ B_{T-1}^TQ_TA_{T-1} & B_{T-1}^TQ_TB_{T-1} & 0 & -B_{T-1}^TQ_T\\ 0 & 0 & 0 & 0\\ -Q_TA_{T-1} & -Q_TB_{T-1} & 0 & Q_T \end{bmatrix} z_{T-1}\right]+\mathrm{trace}(Q_T\Sigma_w) \end{align}