Understanding a Markov decision process

48 Views Asked by Bumbble Comm At 17 Apr 2026 - 1:41

We have an insect that is resting on a vertex of a square at each point of time $t=0,1,2..$. The vertices are labelled from 1 to 4. 1 is given to the lower left vertex, 3 to the upper left vertex, 2 to the lower right vertex and 4 to the upper right vertex. At every point in time $t=0,1,2,...$ the insect makes a jump. If possible he jumps to the right, otherwise he'll jump to one of the two neighboring vertices, with equal probability. We use $X_t$ to denote the vertex the insect rests on at time $t$. So $X_t \in \{1,2,3,4\}$. Then $\{X_t | t=0,1,2...\}$ is a Markov chain with the following transition matrix:

$$\begin{pmatrix}0 & 1 & 0 & 0 \\ 0.5 & 0 & 0 & 0.5 \\ 0 & 0 & 0 & 1 \\ 0 & 0.5 & 0.5 & 0 \end{pmatrix} $$

Now we also assume that the insect has the option to stay on a certain vertex. If the insect stays on a vertex $i$, then there is a payoff of \$1 (for every $i$) and if he jumps, there is a payoff of $i-2 (1 \leq i \leq 4)$

So for this problem, we have that $S = \{ 1,2,3,4 \}$ (set of states). We have two elements in our action set, we denote "the insect stays" with 1, and "the insect jumps" with 2, so we have $A(i) = \{1,2\}$ where $ 1 \leq i \leq 4 $. For our payoff we have $r(1) = 1$ and $r(2) = i-2$.

We then denote $p_{11}(1)$ to mean the probability that an insect who chooses action 1 will go from vertex 1 to vertex 1. We get the following:

$ p_{11}(1) = p_{22}(1) = p_{33}(1) = p_{44}(1) = 1$, all other $p_{ij}(1) = 0$ where $ 1 \leq i,j \leq 4$.

$p_{11}(2) = 0, p_{12}(2) = 1, p_{13}(2) = p_{14}(2) = 0; p_{21}(2) = 0.5, p_{22}(2) = p_{23}(2) = 0, p_{24}(2) = 0.5; p_{31}(2) = p_{32}(2) = p_{33}(2) = 0, p_{34}(2) =1; p_{41}(2) = 0, p_{42}(2) = p_{43}(2) = 0.5, p_{44}(2) = 0$$

So I understand this and the idea of it, but I'm just wondering how one would use this in practice? Let's say I want to ask - Over an infinite amount of time, what would the optimal strategy of this insect be?

How would I use this model to answer questions like that?

Original Q&A

Understanding a Markov decision process

Related Questions in OPTIMIZATION

Related Questions in MARKOV-CHAINS

Trending Questions

Popular # Hahtags

Popular Questions