Shortest Path using Markov Decision

131 Views Asked by Bumbble Comm At 07 Apr 2026 - 4:40

For the following question, one solves the problem to find the shortest path from the Start to the End in the following maze using the Markov decision process.

We formulate this problem as the following:

The state-space $\mathcal{S}$ consists of the white blocks in the figure. Each element is expressed as $(i, j)$.
The action space $A=\{N, S, W, E\}$ where $N, S, W$ and $E$ mean that go to north, south, west and east, respectively. However, the next state does not change if the direction of action is blocked. For example, $$ \begin{aligned} &P((1,3) \mid(1,4), S)=1, \quad P((4,4) \mid(3,4), E)=1 \\ &P((1,5) \mid(1,5), N)=1, \quad P((4,3) \mid(4,4), S)=1 \end{aligned} $$
$(1,5)$ is the initial state and $(5,1)$ is the terminal state. ${ }^{2}$
$R\left(s, a, s^{\prime}\right)=1$ if $s^{\prime}=(5,1)$ and 0 otherwise.
The discounted factor $\gamma=0.9$. Initialize $V_{0}(s)=0$ for all $s \in \mathcal{S}$. Using the Bellman optimal operator $$ (L v)(s)=\sup _{a \in A} \mathbf{E}_{s^{\prime} \sim P(\cdot \mid s, a)}\left[R\left(s, a, s^{\prime}\right)+\gamma \cdot v\left(s^{\prime}\right)\right] $$ set $$ V_{n+1}(s)=\left(L V_{n}\right)(s), \quad n=0,1,2, \cdots . $$ A policy $\pi$ is said to be a greedy policy induced by a value function $V$ if $$ \pi(s) \in \operatorname{argmax}_{a \in A} Q(s, a) $$ where $Q(s, a)=\mathbf{E}_{s^{\prime}}\left[R\left(s, a, s^{\prime}\right)+\gamma V\left(s^{\prime}\right)\right] .$

How is it possible to find the minimum $n \in \mathbb{N}$ such that all of the greedy policies induced by $V_{n}$ is an optimal policy?

$\textbf{Note}$: $\pi^{*}$ is said to be an optimal policy if $V^{\pi^{*}}(s)=V^{*}(s)$ for any $s \in \mathcal{S}$ where $V^{*}$ is an optimal value function.

Original Q&A

Shortest Path using Markov Decision

Related Questions in MARKOV-PROCESS

Related Questions in HIDDEN-MARKOV-MODELS

Trending Questions

Popular # Hahtags

Popular Questions