I just read a paper by the authors Kohlberg and Neyman stating that "A single-person stochastic game is known as a Markov Decision Process (MDP)." Does anyone know if the following extension to $n$ players might work?
Take, say, 2 players playing a game with actions $a\in A$ on a state space $S$. By inflating the state space to $S^2$ and the action space to $A^2$, both players´ positions and actions can be accounted for. The result is again an MDP, only with different state and action spaces.
Does this consitute an MDP with 2 (or $n$) players or am I overlooking anything?
Thanks in advance,
Leon
I am not sure why you have pairs of states. Just because you have more than one player, doesn't mean that you get more states. At each state, every agent observes the same state; it's just that only one of those players gets to decide on an action in a particular state.
So, I have come up with the following: A $l$-player MDP is $ ((S_i)_{i\in[l]},P,A,(R_i)_{i\in[l]},\gamma) $ where the pieces mean the following:
Now, let's define the value function $v_i$ for each player $i$, where $v_i(s)$ is the expected discounted sum of rewards for player $i$ from state $s$ onwards. If $s\in S_j$, then $$ v_i(s) = R(s,a(s)) + \sum_{s'} P(s'|s, a(s) )\cdot \gamma v_i(s'), $$ where $$a(s) = argmax_{a}\; R(s,a(s)) + \sum_{s'} P(s'|s,a )\cdot \gamma v_j(s'). $$
So basically, the value of state $s$ for player $i$ is the value of the expected next state if the action is chosen by the player whose turn it is in state $s$. Note that if $i=j$, so if it's player $i$'s turn, then the above formula becomes: $$ v_i(s) = \max_{a} R(s,a) + \sum_{s'} P(s'|s, a )\cdot \gamma v_i(s') $$ which is exactly as in a $1$-player MDP.