I have been stuck with a problem for a while regarding Markov Decision Processes for a Policy improvement algorithm.
Assume that I have probabilities for certain states to evolve the system into, i.e. $p_{ij}(k)$ is the probability that the system evolves from state $i$ to $j$ if decision $k$ is made. Moreover, assume that there are two decision variables, $k_1=1$ and $k_2=2$.
There are also immediate costs $q_{ij}(k)$ which are defined as the resulted cost of evolving the system from state $i$ to $j$ by making decision $k$.
How is the expected cost calculated ? I have seen two methods to calculate it:
1. ${C_{ik}}$ = ${\sum_{j=0}^{N}q_{ij}(k)\cdot p_{ij}(k)}$
2. ${C_{ik}}$ is determined as the immediate cost (As $q_{ij}(k)$), and the probabilites are ignored. They are only applied when calculating the policy improvement algorithm.
Appreciate all help, thank you !