It is well known that in MDPs with a discrete state space, there exists a deterministic policy that is optimal, in the sense that it maximizes the total expected discounted reward from any initial state. I was wondering if this is also true for MDPs with a continuous state space and looking for a reference where this result was derived. Thanks in advance!
2026-03-26 19:14:15.1774552455
Is there always an optimal deterministic policy in MDPs with a continuous state space?
288 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in MARKOV-PROCESS
- Definition of a Markov process in continuous state space
- What is the name of the operation where a sequence of RV's form the parameters for the subsequent one?
- Given a probability $p$, what is the upper bound of how many columns in a row-stochastic matrix exceed $p$?
- Infinitesimal generator of $3$-dimensional Stochastic differential equation
- Controlled Markov process - proper notation and set up
- Easy way to determine the stationary distribution for Markov chain?
- Why cant any 3 events admit Markov Property?
- Absorbing Markov chain and almost sure convergence
- Transition probabilities for many-states Markov model
- How to derive a diffusion tensor and stationary states given a Markov process transition matrix?
Related Questions in DYNAMIC-PROGRAMMING
- Dynamic programming for Knapsack problem
- DP algorithm for covering the distance between two points with a set of intervals
- Solution of an HJB equation in continuous time
- correctness for minimizing average completition time for scheduling problem with release times
- Zero-sum differential game
- An enclosing polygon with minimum area
- Divide set into two subsets of equal sum and maximum this sum
- Stochastic Dynamic Programming: Deriving the Steady-State for a Lottery
- How would you prove that a dynamic programming problem is solvable by a greedy algorithm?
- How to find minimal distances route for a trip of $t$ days, given distances for each stop?
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
First, I do want to point out that the existence of optimal policies is not guaranteed in discrete problems when the discount factor $\beta = 1$.
Example. Let $\mathbb{X} = \{1,2,\dots\} \cup Z$ be the state space with $Z = \{z_1, z_2\}$ as dead states, and let $\mathbb{A} = \{s,c\}$ be the action space. The dynamics of the system follow the rule \begin{equation} x_{t+1} = f(x_t,a_t) = \begin{cases} z_2 & \text{if $x \in Z$,} \\ z_1 & \text{if $x \notin Z$ and $a_t = s$,} \\ x_t+1 & \text{if $x \notin Z$ and $a_t = c$.} \end{cases} \end{equation} Starting from $x=1,2,\dots$ you either move right, or you go to the first dead state $z_1$, and then proceed to the next dead state $z_2$, where you stay forever.
The cost function \begin{equation} c(x,a) = \begin{cases} x^{-1} & \text{if $x \notin Z$ and $a=s$,} \\ -1 & \text{if $x = z_1$,} \\ 0 & \text{otherwise,} \end{cases} \end{equation} is bounded, $0 \leq c(x,a) \leq 1$. If you "die" at state $x = 1, 2, \dots$ you pay a total of $x^{-1}-1$. Otherwise, you keep moving right at no cost.
Let's calculate the value function \begin{equation} v(x) = \inf_{\pi} v^{\pi}(x) := \sum_{t=0}^{\infty} c(x_t,\pi(x_t)) \end{equation} where the policies $\pi$ are deterministic (relaxing this does not affect the analysis).
Clearly, $-1 \leq v(x) \leq 0$. In fact, by "dying" on or after the state $x=n$, one pays a cost no greater than $n^{-1}-1$. Let $\pi_n$ denote this policy. Then for $x=1,2,\dots$, we obtain $v(x) \leq v^{\pi_n}(x) \leq n^{-1}-1$ for all $n$; hence, $v(x) = -1$ for each $x=1,2,\dots$. But there is no policy that achieves this value.
Continuous state space models
Very little can be said in continuous state space models without additional assumptions. For example, the value function may not be well-defined unless measurability considerations are taken into account.
It is common to have three types of general assumptions that enable us to say that the infinite horizon value functions are well-defined:
Assumption P. The costs $c(x,a)$ are nonnegative for all $x$ and $a$.
Assumption N. The costs are nonpositive for all $x$ and $a$.
Assumption D. The discount factor $\beta \in (0,1)$, and the costs are bounded: $|c(x,a)| < M$ for all $x$ and $a$.
Essentially, these do away with convergence issues in the infinite sum.
There are, broadly speaking, two major directions these assumptions go in: semicontinuous models and Borel models. I could be biased here because this is my research area, but my impression is that most people care about problems that fall into some form of a semicontinuous model. I am not going to touch Borel models here. They rely heavily on arcane topics in measure theory and related concepts, such as analytic sets and Suslin spaces. See the references at the end. Here is a typical presentation of the assumptions in the semicontinuous model:
$\mathbb{X}$ and $\mathbb{A}$ are Borel subsets of Polish (complete, separable) metric spaces;
$c(x,a)$ is lower semicontinuous and bounded below;
Either property holds:
The transition kernel $P(dx_{t+1}|x_t,a_t)$ is weakly continuous.
Under assumptions like these, the value functions \begin{align} v_n(x) &= \inf_{\pi} v_{n}^\pi(x) := \mathbb{E}_x^\pi \sum_{t=0}^{n-1} \beta^t c(x_t, a_t) \\ v(x) &= \inf_{\pi} v^\pi(x) :=\mathbb{E}_x^\pi \sum_{t=0}^{\infty} \beta^t c(x_t, a_t) \end{align} are well defined, where $\pi$ is a policy and $\mathbb{E}_x^\pi$ denotes the expectation operator for the process starting at $x$ under $\pi$. In the most general form, policies are sequences of stochastic kernels $\pi = (\pi_1, \pi_2, \dots)$ where $\pi_t(da_t|x_0,a_0,x_1,a_1,\dots,x_{t})$ depends on the entire history of the process. However, in many settings it is sufficient to consider some combination of deterministic, Markov, or stationary policies.
Under the above assumptions, we get the following facts:
It is possible to tailor some of these assumptions in different directions, but this is the gist.
Some texts you may find useful:
Bertsekas, D.P. and Shreve, S.E. Stochastic Optimal Control: The Discrete-Time Case (1996), Athena Scientific.
Hernández-Lerma, O. Adaptive Markov Control Processes (1989), Springer.
Feinberg, E.A. and Shwartz, A. Handbook of Markov Decision Processes (2002), Springer.
For a recent paper along these lines (including the average cost criterion), see here.