Existence of $\varepsilon$-optimal Borel measurable policies in stochastic control

65 Views Asked by At

I am reading the book "Stochastic Optimal Control: The Discrete Time Case", by Bertsekas and Shreve (hereafter called "the Book"), and I recently observed that a statement made in page 10 of the book (Introduction) seems that can be stated somewhat more generally.

The statement under question is described in the following:

Let $\mathscr{B}_\mathbb{R}$, $\mathscr{B}_{\mathbb{R}^2}$ denote the Borel $\sigma$-algebras on $\mathbb{R}$, $\mathbb{R}^2$, and consider a Borel measurable function $g:\mathbb{R}^2\rightarrow\mathbb{R}$, such that

\begin{equation} \inf_{u\in\mathbb{R}}g\left(x,u\right)>-\infty,\quad \forall x\in \mathbb{R}. \end{equation}

Consider now the set of all Borel measurable functions (policies) from $\mathbb{R}$ to $\mathbb{R}$ and denote it by $\cal{P}$.

I claim that, for any $\varepsilon>0$, there exists a Borel measurable policy $\mu_\varepsilon\in\cal{P}$, such that

\begin{equation} g\left(x, \mu_\varepsilon \left(x \right) \right) \le\inf_{u\in\mathbb{R}}g\left(x,u\right) + \varepsilon,\quad \forall x\in \mathbb{R}. \end{equation}

In the Book, on the other hand, it is claimed that the inequality above holds only almost everywhere, with respect to some given probability measure on $\mathscr{B}_\mathbb{R}$.

The proof of my claim follows.


First, for any measurable policy $\mu\in\cal{P}$, it holds that

\begin{equation} g\left(x,\mu \left( x \right) \right) \ge \inf_{\mu\in\cal{P}}g\left(x, \mu \left( x \right) \right) \ge \inf_{u\in\mathbb{R}}g\left(x,u\right)>-\infty,\quad \forall x\in \mathbb{R}. \end{equation}

Fix an $\varepsilon>0$. Then, there exists a Borel measurable policy $\mu_\varepsilon\in\cal{P}$, such that

\begin{equation} g\left(x, \mu_\varepsilon \left(x \right) \right) \le \inf_{\mu\in\cal{P}}g\left(x, \mu \left( x \right) \right) + \varepsilon,\quad \forall x\in \mathbb{R}. \end{equation}

Note that such a policy may be always found, since otherwise we would be led to a contradiction: If such a policy does not exist, then it would be true that

\begin{equation} g\left(x, \mu \left(x \right) \right) > \inf_{\mu\in\cal{P}}g\left(x, \mu \left( x \right) \right) + \varepsilon,\quad \forall x\in \mathbb{R}, \end{equation}

for all $\mu\in\cal{P}$, contradicting the fact that $\inf_{\mu\in\cal{P}}g\left(x, \mu \left( x \right) \right)$ is the infimum over $\mu\in\cal{P}$.

Now, since $\cal{P}$ is the class of all Borel measurable functions from $\mathbb{R}$ to itself, the set containing all constant policies, defined as

\begin{equation} \mu_u \left( x \right) \triangleq u,\quad \forall x\in\mathbb{R}, \quad\text{for some } u\in\mathbb{R}, \end{equation}

will be a subset of $\cal{P}$, and, therefore,

\begin{equation} \inf_{\mu\in\cal{P}}g\left(x, \mu \left( x \right) \right) \le g\left(x, \mu_u \left( x \right) \right) = g\left(x, u \right) \quad\forall x\in\mathbb{R}\quad\text{and}\quad \forall u\in\mathbb{R}. \end{equation}

In particular, taking infima on both sides, it will also be true that

\begin{equation} \inf_{\mu\in\cal{P}}g\left(x, \mu \left( x \right) \right) \le \inf_{u\in\mathbb{R}}g\left(x, u \right) \quad\forall x\in\mathbb{R}. \end{equation}

This last inequality implies that there exists $\mu_\varepsilon\in\cal{P}$, such that

\begin{equation} g\left(x, \mu_\varepsilon \left(x \right) \right) \le\inf_{u\in\mathbb{R}}g\left(x,u\right) + \varepsilon,\quad \forall x\in \mathbb{R}. \end{equation}

for any arbitrary chosen $\varepsilon>0$, which seems to prove my claim.


Have I done anything wrong in the above derivations? Why is claimed in the Book that this result holds only almost everywhere in $x$?

Thanks!