A doubt on markov decision process

149 Views Asked by At

Given that a policy is a function from a state action pair to probabilities, the set of policies for a MDP forms a POSET (the partial order is due to value function for a policy). Why there should be a maximal element in that POSET ? Are all the maximal elements the same.

1

There are 1 best solutions below

0
On

I agree that the existence of a maximal element isn't obvious. Indeed, a form of your first question (i.e. the existence of an optimal policy) has occupied researchers in MDPs since the field got its start in the mid-20th century.

It was shown early on (by Ronald Howard in 1960 and David Blackwell in 1962) that for discounted and average-cost MDPs with finite state and action sets, a greatest element always exists and can be constructed in a finite number of steps - see e.g. Chapters 6, 8, and 9 in the book Markov Decision Processes: Discrete Stochastic Dynamic Programming by Martin Puterman.

But, if either the state set or action sets are infinite, there might not be a maximal element without some additional assumptions - see e.g. Section 6.6 in Applied Probability Models with Optimization Applications by Sheldon Ross.

Regarding your second question, I'm not aware of any examples where there is no greatest element but more than one maximal element. It might be possible to construct one using one of the examples of MDPs with no optimal policy.