How to solve this exercise about the Hamilton-Jacobi-Bellman equation?

87 Views Asked by At

I found this exercise in my stochastic analysis notes, but at the time no solution was provided. I am trying to solve it now, but I struggle to do so, since I do not have any knowledge on control theory.

For some bounded, measurable control $c(s,x)$ let

$$X_{s}^{t,x,c} = x + \int_{t}^{s} c(r,X_{r}^{t,x,c}) dr + \int_{t}^{s} dB_r,$$

for $s \in [t,T]$,that is $X_{s}^{t,x,c}$ is the process that obtains by starting at time $t$ with value $x$ and control $c$ under the influence of brownian noise. For some $\phi$ let $$V_{t}^{x,c} = \mathbb{E}\left[ \phi\left(X_{T}^{t,x,c})\right) - \frac{1}{2} \int_t^T ||c\left(r,X_{r}^{t,x,c}\right)||^2 dr \right].$$

This means that $\phi$ is some payoff/return we obtain at the final timestep $T$, but we have some cost associated with the control function, which here is simply the norm of the control. Thus $V$ conveys the value we obtain at time $T$ when starting with $x$ at time $t$ and using the control function $c$.

Now let $u(t,x)$ solve the pde

$$ \partial_t u = - \frac{1}{2} \left(\Delta u + ||\nabla u||^2\right)$$ $$ u(T,x) = \phi(x),$$

and we also assume (or know a posteriori?) that $u$ is bounded. Then the goal of the exercise is to show that $$ u(t,x) = \underset{c}{\sup} V_{t}^{x,c},$$

that is $u$ gives the optimal value we can obtain at the last timestep when we are at $x$ at time $t$.

Finally, the exercise gives the hint to proceed in two steps:

  1. show that $V_{t}^{x,c} \leq u(t,x)$
  2. Choose $c^*$ such that $V_{t}^{x,c^*} = u(t,x).$

Now, I am still kind of confused by the dependencies in this exercise and do not see how to do the first step. For the second step, by the pde that $u$ solves I can kind of guess where we want to go : Letting $$X_{s}^{t,x,\nabla u} = x + \int_{t}^{s} \nabla u (r,X_{r}^{t,x,c}) dr + \int_{t}^{s} dB_r ,$$ and $$\mathcal{L}_{\nabla U}(f) = \nabla u \cdot \nabla f + \frac{1}{2} \Delta f,$$ we obtain (in the same way as the Kolmogorov forward equation is proved) that \begin{align*} &u(s,X_{s}^{t,x,\nabla u})-u(t,X_{t}^{t,x,\nabla u}) - \int_{t}^s (\partial_t + \mathcal{L}_{\nabla u} u(r,X_{r}^{t,x,\nabla u}) dr \\ &=u(s,X_{s}^{t,x,\nabla u})-u(t,X_{t}^{t,x,\nabla u}) +\frac{1}{2} \int_{t}^s ||\nabla u(r,X_{r}^{t,x,\nabla u}||^2) dr \end{align*} is a martingale. This implies in particular that

\begin{align*}V_{t}^{x,\nabla u} &= \mathbb{E}[\phi(X_{T}^{t,x,\nabla u})- \int \frac{1}{2} \int_{t}^s ||\nabla u(r,X_{r}^{t,x,\nabla u}||^2) dr ]\\ &= \mathbb{E}[u(T,X_{T}^{t,x,\nabla u})- \int \frac{1}{2} \int_{t}^s ||\nabla u(r,X_{r}^{t,x,\nabla u}||^2) dr ] \\ &= \mathbb{E}[u(t,X_{t}^{t,x,\nabla u})] = u(t,x) ,\end{align*}

which shows step 2. Of course, in hindsight it makes sense to choose to control as an ascent on $u$, since this conveys the optimal value we can get out in the end. However, without knowing this apriori it is unclear to me how to bound $V$ by $u$, i.e. how to do the firt step.

Any hints would be appreciated! (Also note that this was an exercise in a stochastic analysis class, so I think it should only use techniques that would be used in such a setting, e.g. I would be surprised if I would have to use some form of functional derivative)

1

There are 1 best solutions below

0
On

It seems like the answer to step 1 is in the end using the same trick as for step 2:

Consider any control $c(s,x)$. Then we consider the process

$$X_{s}^{t,x,c}$$ defined as above, and also the operator $$\mathcal{L}_{c}f = c\cdot \nabla f + \frac{1}{2}\Delta f.$$

We find that the process

\begin{equation} u(s,X_{s}^{t,x,c}) - u(t,X_t^{t,x,c}) - \int_t^s \left(\partial_t + \mathcal{L}_c\right) u(r,X_r^{t,x,c}) dr\end{equation} is a martingale. Now by the p.d.e that u solves we get that $$\left(\partial_t + \mathcal{L}_c\right) u = c\cdot u - \frac{1}{2} ||\nabla u||^2.$$

As before, taking again the expectation and using the martingale property we arive at

$$\mathbb{E}[\phi(X_{T}^{t,x,c})] = u(t,x) +\mathbb{E}\left[ \int_{t}^{T} c(r,X_r^{t,x,c}\cdot u(r,X_r^{t,x,c})- \frac{1}{2} ||\nabla u(r,X_r^{t,x,c})|| \right].$$

Now subtracting $\mathbb{E}[\frac{1}{2}\int_{t}^T \||c(r,X_{r}^{t,x,c}||^2 dr $ from both sides yields

$$ \mathbb{E}\left[\phi(X_T^{t,x,c}) - \frac{1}{2} \int_{t}^T \||c(r,X_{r}^{t,x,c}||^2 dr\right] = u(t,x) - \frac{1}{2} \int_t^T || \nabla u (r,X_r^{t,x,c}) - c(r,X_r^{t,x,c}) ||^2 dr,$$

which shows step 1.