HJB when optimal control responds partially to Brownian motion

61 Views Asked by Bumbble Comm At 10 May 2026 - 3:09

I cannot formulate the HJB $V$ in a simple problem where the state $a$ follows a Brownian motion, and the control $\ell$ is (for some states) such that the derivative $V_\ell$ stays constant.

In detail:

With state $a$ and control $\ell$, objective is to maximize

\begin{align*} V(a(0), \ell(0)) &= \max_{n(t)} \int_0^\infty e^{-\rho t}\pi(a(t), \ell(t)) - \chi |n(t)| dt\\ &\text{s.t. }\\ \dot l(t) &= n(t) \\ da(t) &= \mu dt + \sigma dW(t) \\ \ell(0), a(0) &\text{ given} \end{align*}

Where $\pi_a, \pi_\ell, \pi_{a, \ell}$ all positive.

We focus on a $(a, \ell)$ locus where $V_\ell(a, \ell) = \chi$: Postulate the following optimal policy: any positive increment in $a$ will trigger a positive response from $\ell$ such that $\pi_{a, \ell} = \chi$ holds true again. We solve for this in the following discrete time representation: Each period has a length of $\Delta_t$. Within each $\Delta_t$, $a$ increases with probability $p(\Delta_t)$ by $\sigma \sqrt\Delta_t$, and decreases by $\sigma \sqrt\Delta_t$ with probability $1 - p(\Delta_t)$, where

$$ p(\Delta) = \frac{1}{2}\left(1 + \frac{\mu \sqrt{\Delta_t}}{\sigma}\right) $$

This is a random walk approximation that will converge to the Brownian motion as $\Delta_t \to 0$. We postulate that the change in $\ell$ within each period takes the form $\Delta_\ell = n \sqrt\Delta_t$

\begin{align*} V_\ell(a + \sigma \sqrt\Delta_t, \ell + n\sqrt\Delta_t) &= \chi = V_\ell(a, \ell)\\ V_\ell(a, \ell) + V_{\ell a}(a, \ell)\sigma \sqrt\Delta_t + \frac{\sigma^2 \Delta_t}{2}V_{\ell a a} + V_{\ell\ell}n\sqrt\Delta_t + V_{\ell\ell\ell}\frac{n^2 \Delta_t}{2} + o(\Delta_t) &= V_\ell(a, \ell) \end{align*}

Divide by $\sqrt \Delta_t$, solve for $n$:

\begin{align*} V_{\ell\ell}n + V_{\ell\ell\ell}\frac{n^2 \sqrt \Delta_t}{2} &= -V_{\ell a}(a, \ell)\sigma -\frac{\sigma^2 \sqrt\Delta_t}{2}V_{\ell a a}(a, \ell) - \frac{o(\Delta_t)}{\sqrt \Delta_t}\\ \lim_{\Delta_t \to 0} V_{\ell\ell}(a, \ell)n&= -V_{\ell a}(a, \ell)\sigma \\ n &= -\sigma\frac{V_{\ell a}(a, \ell)}{V_{\ell\ell}(a, \ell)} \end{align*}

If we had instead assumed $\Delta_\ell = n \Delta_t$, we had to divide by $\Delta_t$ in this step, and $V_{\ell, a} \sigma/\sqrt\Delta_t$ would have appeared on the right-hand side. This term would explode as we let $\Delta_t \to 0$: It \emph{has} to be that $\ell$ is a factor of $\sqrt \Delta_t$ or smaller.

So let's solve for the recursive formulation of $V$ given this optimal $n$, recognizing that a change in $\ell$ only triggers if $a$ has a positive increment -- that is, we still focus on $(a, \ell)$ where $V(a, \ell)_\ell = \chi$.

\begin{align*} V(a, \ell) &= \pi(a, \ell)\Delta_t + (1-\rho\Delta_t) \left[ p(\Delta)V(a + \sigma \sqrt\Delta_t, \ell + n\sqrt\Delta_t) + (1 - p(\Delta))V(a - \sigma \sqrt\Delta_t, \ell)\right] \end{align*}

Again Taylor series approximation:

\begin{align*} V(a, \ell) &= \pi(a, \ell)\Delta_t + (1-\rho\Delta_t) \bigg\{ p(\Delta)\big[V(a, \ell) + V_a \sigma \sqrt\Delta_t + \frac{\sigma^2\Delta_t}{2}V_{aa} + V_\ell n \sqrt\Delta_t + \frac{n^2}{2}V_{\ell\ell}\Delta_t\big] \\ &+ (1 - p(\Delta))\big[V(a, \ell) - V_a \sigma \sqrt\Delta_t + \frac{\sigma^2\Delta_t}{2}V_{aa}\big]\bigg\} \\ V(a, \ell) &= \pi(a, \ell)\Delta_t + (1-\rho\Delta_t) \bigg\{ V(a, \ell) + (2p(\Delta_t) - 1) V_a \sigma \sqrt\Delta_t + \frac{\sigma^2\Delta_t}{2}V_{aa} + p(\Delta_t)\big[V_\ell n \sqrt\Delta_t + \frac{n^2}{2}V_{\ell\ell}\Delta_t\big]\bigg\} \end{align*}

To solve for $V$, we subtract $(1-\rho\Delta_t)V$ on both sides, divide by $\Delta_t$ and let $\Delta_t \to 0$:

\begin{align*} \rho V(a, \ell) &= \pi(a, \ell) + (1-\rho\Delta_t) \bigg\{\frac{2p(\Delta_t) - 1}{\sqrt \Delta_t} V_a \sigma + \frac{\sigma^2}{2}V_{aa} + \frac{p(\Delta_t)}{\sqrt\Delta_t}\big[V_\ell n + \frac{n^2}{2}V_{\ell\ell}\big]\bigg\} \end{align*}

$\Delta_t$ appears in two expressions: that we need to take care of before letting $\Delta_t \to 0$: $(2p(\Delta_t) - 1)/\sqrt\Delta_t = \mu/\sigma$, which is fine. The second expression is the key problem:

\begin{align*} \frac{p(\Delta_t)}{\sqrt\Delta_t} &= \frac{1}{2}\left(1 - \frac{\mu\sqrt\Delta_t}{\sigma}\right)\frac{1}{\sqrt\Delta_t} = \frac{1}{2\sqrt\Delta_t} - \frac{\mu}{2\sigma} \end{align*}

which implies that it will explode when $\Delta_t \to 0$.

The superficial reason this happens is that the drift term and $p(\Delta_t)$ do not cancel out the $\sqrt\Delta_t$ term. For $a$, the upwards and downwards increment components together cancel out that term. For $\ell$, only upwards increments matter and the term does not cancel out. This seems to suggest that a recursive formulation of the value function under the optimal policy in continuous time does not exist.

Is there another approach to deriving $V$, or the optimal policy $n$? Am I missing something?

Original Q&A

HJB when optimal control responds partially to Brownian motion

Related Questions in STOCHASTIC-CALCULUS

Related Questions in BROWNIAN-MOTION

Related Questions in OPTIMAL-CONTROL

Related Questions in STOCHASTIC-DIFFERENTIAL-EQUATIONS

Related Questions in HAMILTON-JACOBI-EQUATION

Trending Questions

Popular # Hahtags

Popular Questions