Markov Chain question?

112 Views Asked by At

Let $M=(S,A,\{A(s)\}_{s\in S},P)$ be a Markov chain, where $S$ is the set os states, $S \subset \mathbb{N}$ and $A$ is a finite set of actions. For each $s \in S$, $A(s) \subset A$ is the non-empty set of admisible actions at state $s \in S$.

Ok, this is a construction that my teacher told us, but I don't understand the idea of the set $A(s)$, what is the function of this set? I don't understand why called it admissible actions? what is the difference with the set of actions?

And with this there is an observation that said:

Without loss of generality we may take $A = \cup_{s \ n S} A(s)$. Whereas, $\mathbb{K}=\{(s,a)| s \in S, a \in A(s)\}$ is the set of admissible state-action pairs. And the variable $p_{j|ik}$ is a stationary controlled transition matrix (why is controlled? what does that mean?), which defines a stochastic kernel on $S$ given $\mathbb{K}$ (I don't understand this implication), where

$$p_{j|ik}:= P(X_{t+1}=s_j|X_t=s_i,A_t=a_k), \ \ \forall t \in \mathbb{N} $$

So Is this the representation of the probability associated with the transition from the state $s_i$ to $s_j$, $i,j=1,...,n$ under an action $a_k \in A(s_i)$ ?

Could someone explain this construction or definition of Markov chain pls.

Thanks for your help and time everyone.

1

There are 1 best solutions below

3
On

Perhaps I can explain this concept more clearly with some example. As a brief note, what you are describing is not a Markov Chain, but rather a Markovian Decision Process. Consider the following MDP:

A robot is allowed to move on a grid of dimension $(n,n)$. He starts on the grid at $(0,0)$ and at each timestemp may choose to move to a square that is either one unit away vertically or one unit away horizontally so long as he stays on the grid. After some time, me might look like this:

|------------|------------|------------|-----------|
|            |            |            |           |
|      (0,0) |      (0,1) |        ... |           |
|------------|------------|------------|-----------|
|            |            |            |           |
|      (1,0) |            |            |           |
|------------|------------|------------|-----------|
|            |            |            |           |
|        ... |            |            |           |
|------------|------------|------------|-----------|
|            |            |            |   O_O     |
|            |            |            |   \||     |
|------------|------------|------------|-----------|
|            |            |            |           |
|            |            |            |           |
|------------|------------|------------|-----------|

What are all the states? We can denote them by their coordinates: $(a,b) \forall a \in [0,1]$. What is the set of all actions? Well that is simply (up,left,right,down). However, the robot can not perform all of those actions from every state. For example, in his starting state, he may only move down or right as he is restricted to staying on the grid. Thus we can say, $A((0,0)) = \{up, right\}$ as those are the only admissible or "allowed" actions in state $(0,0)$ which is clearly different than the set of all actions. Furthermore, one could say that $\cup_{s \in S} A(s) = \mathbb{A}$ for all MDP.

As for the probabilities, it may be the case that in certain states, the robot likes to do certain actions more than others (or perhaps completely uniformly). Suppose we observed the robot for a very, very long time and came up with the percentages of the time he transitioned from state $s$ to state $s'$. We could write out a matrix representing such transition probabilities. This is the stochastic kernel.