Markov Decision Processes and Control Engineering?

Question

Markov Decision Processes and Control Engineering?

280 Views Asked by Bumbble Comm At 27 Mar 2026 - 11:30

In a Markov Decision Process, one has a Markov chain with (left) regular stochastic matrix $P$ and a collection of "actions" $a_i$ one can have act on the system after it transitions via its "drift". In the absence of taking actions, the system should drift to its steady-state (open-loop) vector $\pi_{OL}$ with $P\pi_{OL} = \pi_{OL}$.

My question is, if we model the actions as occurring via some inputs $\{u_1, \ldots, u_k\}$ with matrix $B$, can we (always?) set up a feedback loop with gain matrix $K$ so that the matrix $P-BK$ of the closed-loop system 1) is a regular stochastic matrix and 2) has a prescribed steady-state closed-loop vector $\pi_{CL}$?

What, if anything, is known about this phenomenon?

(I bet if one has $n$ independent inputs $\{u_1, \ldots, u_n\}$ with full-rank matrix $B$, this problem always has a solution gain matrix $K$.)

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Let $1_n=\begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1\end{bmatrix}$ and consider the system $\begin{cases} X^T1_n = 1_n \\ X\pi_{CL} = X\pi_{CL} \end{cases}$ ($X \in \mathbb{R}^{n \times n}$). Any solution $Q$ to this system of equations with $q_{ij} > 0$ will be a regular stochastic matrix that has $\pi_{CL}$ as its steady-state vector. This is not an eigenvalue/eigenvector linear algebra problem but instead should be able to be solved with linear algebra's reduced row echelon form and linear programming (to ensure the entries are positive).

If one then sets $K = B^{-1}(P-Q)$, then $K$ is the desired gain matrix for the closed-loop system.

Here is some MATLAB code implementing finding the gain matrix $K$ given $\{P, B, C, D\}$ and $\pi_{CL}$ http://airvigilante194.sdf.org/Scripts/Math%20205/MATLAB/: MarkovJeff.m is a function finding a regular stochastic matrix $Q$ with $\pi_{CL}$ as its steady-state vector; placeJeff.m is a function that takes the inputs $P, B,$ and $\pi_{CL}$, calls MarkovJeff.m to find $Q$, then computes the gain matrix $K$; and closedLoop.m is a .m script wherein one specifies $\{P, B, C, D\}$ and $\pi_{CL}$, and the gain matrix $K$ is computed, calling placeJeff.m. It may require functions from https://people.math.wisc.edu/~robbin/minimat/mfiles.zip to run.

http://airvigilante194.sdf.org/Scripts/Math%20205/Octave/ has the same files, adapted for Octave. Again, it may require functions from https://people.math.wisc.edu/~robbin/minimat/mfiles.zip to run.

This http://airvigilante194.sdf.org/Scripts/Math%20205/MATLAB/MarkovChain.slx is a Simulink file implementing the discrete-time Marvov chain as an open-loop system; it drifts towards the open-loop steady-state vector $\pi_{OL}$.

Here http://airvigilante194.sdf.org/Scripts/Math%20205/MATLAB/MarkovDecisionProcess.slx is a Simulink file implementing the discrete-time Markov decision process as a closed-loop feedback control system; it drifts towards the prescribed closed-loop steady-state vector $\pi_{CL}$.

(Evidently, XCos is only able to handle scalar [SISO] signals, so I won't be posting a file.)

Markov Decision Processes and Control Engineering?

There are 1 best solutions below

Related Questions in MATRICES

Related Questions in MARKOV-CHAINS

Related Questions in MARKOV-PROCESS

Related Questions in CONTROL-THEORY

Related Questions in LINEAR-CONTROL

Trending Questions

Popular # Hahtags

Popular Questions