A doubt on markov decision process

153 Views Asked by Bumbble Comm At 10 May 2026 - 3:53

Given that a policy is a function from a state action pair to probabilities, the set of policies for a MDP forms a POSET (the partial order is due to value function for a policy). Why there should be a maximal element in that POSET ? Are all the maximal elements the same.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 03 Apr 2015 - 9:39

I agree that the existence of a maximal element isn't obvious. Indeed, a form of your first question (i.e. the existence of an optimal policy) has occupied researchers in MDPs since the field got its start in the mid-20th century.

It was shown early on (by Ronald Howard in 1960 and David Blackwell in 1962) that for discounted and average-cost MDPs with finite state and action sets, a greatest element always exists and can be constructed in a finite number of steps - see e.g. Chapters 6, 8, and 9 in the book Markov Decision Processes: Discrete Stochastic Dynamic Programming by Martin Puterman.

But, if either the state set or action sets are infinite, there might not be a maximal element without some additional assumptions - see e.g. Section 6.6 in Applied Probability Models with Optimization Applications by Sheldon Ross.

Regarding your second question, I'm not aware of any examples where there is no greatest element but more than one maximal element. It might be possible to construct one using one of the examples of MDPs with no optimal policy.

A doubt on markov decision process

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in CONTROL-THEORY

Related Questions in MARKOV-PROCESS

Related Questions in DECISION-THEORY

Trending Questions

Popular # Hahtags

Popular Questions