Markov Decision Processes and Control Engineering?

280 Views Asked by At

In a Markov Decision Process, one has a Markov chain with (left) regular stochastic matrix $P$ and a collection of "actions" $a_i$ one can have act on the system after it transitions via its "drift". In the absence of taking actions, the system should drift to its steady-state (open-loop) vector $\pi_{OL}$ with $P\pi_{OL} = \pi_{OL}$.

My question is, if we model the actions as occurring via some inputs $\{u_1, \ldots, u_k\}$ with matrix $B$, can we (always?) set up a feedback loop with gain matrix $K$ so that the matrix $P-BK$ of the closed-loop system 1) is a regular stochastic matrix and 2) has a prescribed steady-state closed-loop vector $\pi_{CL}$?

enter image description here

What, if anything, is known about this phenomenon?

(I bet if one has $n$ independent inputs $\{u_1, \ldots, u_n\}$ with full-rank matrix $B$, this problem always has a solution gain matrix $K$.)

1

There are 1 best solutions below

1
On BEST ANSWER

Let $1_n=\begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1\end{bmatrix}$ and consider the system $\begin{cases} X^T1_n = 1_n \\ X\pi_{CL} = X\pi_{CL} \end{cases}$ ($X \in \mathbb{R}^{n \times n}$). Any solution $Q$ to this system of equations with $q_{ij} > 0$ will be a regular stochastic matrix that has $\pi_{CL}$ as its steady-state vector. This is not an eigenvalue/eigenvector linear algebra problem but instead should be able to be solved with linear algebra's reduced row echelon form and linear programming (to ensure the entries are positive).

If one then sets $K = B^{-1}(P-Q)$, then $K$ is the desired gain matrix for the closed-loop system.

Here is some MATLAB code implementing finding the gain matrix $K$ given $\{P, B, C, D\}$ and $\pi_{CL}$ http://airvigilante194.sdf.org/Scripts/Math%20205/MATLAB/: MarkovJeff.m is a function finding a regular stochastic matrix $Q$ with $\pi_{CL}$ as its steady-state vector; placeJeff.m is a function that takes the inputs $P, B,$ and $\pi_{CL}$, calls MarkovJeff.m to find $Q$, then computes the gain matrix $K$; and closedLoop.m is a .m script wherein one specifies $\{P, B, C, D\}$ and $\pi_{CL}$, and the gain matrix $K$ is computed, calling placeJeff.m. It may require functions from https://people.math.wisc.edu/~robbin/minimat/mfiles.zip to run.

http://airvigilante194.sdf.org/Scripts/Math%20205/Octave/ has the same files, adapted for Octave. Again, it may require functions from https://people.math.wisc.edu/~robbin/minimat/mfiles.zip to run.

This http://airvigilante194.sdf.org/Scripts/Math%20205/MATLAB/MarkovChain.slx is a Simulink file implementing the discrete-time Marvov chain as an open-loop system; it drifts towards the open-loop steady-state vector $\pi_{OL}$.

Open-Loop System

Here http://airvigilante194.sdf.org/Scripts/Math%20205/MATLAB/MarkovDecisionProcess.slx is a Simulink file implementing the discrete-time Markov decision process as a closed-loop feedback control system; it drifts towards the prescribed closed-loop steady-state vector $\pi_{CL}$.

Closed-Loop Steady-State

(Evidently, XCos is only able to handle scalar [SISO] signals, so I won't be posting a file.)