Maximizing expected reward at final time via dynamic programming

398 Views Asked by Bumbble Comm At 10 May 2026 - 2:03

Consider a state $x_t$ that probabilistically evolves over time according to a controlled Markov chain, i.e., according to known probabilities

$$\mathbb P(x_{t+1}=x' \,|\, x_t=x,a_t=a)$$

where $a_t$ is the control action at time $t$. To each state $x$, a known reward $r(x)$ is assigned. Actions have no cost. Our objective is to find the sequence of actions so as to maximize the expected reward at a given final time, $\mathbb E[r(x_T)]$. We do not care about rewards before the final time.

It seems that this problem can naturally be addressed by dynamic programming. I've defined the final value function as follows

\begin{align} V_T(x_T) = r(x_T) \end{align}

and value functions for each $t<T$ as

\begin{align} V_t(x) &= \max_{a\in A}\mathbb{E} \Big( V_{t+1}(f(x_t,a_t)) \mid x_t=x,a_t=a \Big)\\ &= \max_{a\in A}\sum_{x'} \mathbb P(x_{t+1}=x' \,|\, x_t=x,a_t=a) V_{t+1}(x') \end{align}

I am confused about how to actually implement this (I'm trying to write the pseudocode). I don't know how to initialize the problem. In a lot of dynamic programming approaches they set the value function at the final time to be zero, but that doesn't seem to make any sense here.

Question: How do I implement the dynamic programming equations above?

My ideas so far: Since we know the reward function, perhaps we can initialize $V_T(x)$ to $r(x)$ for every possible $x$? Then using the value functions $V_t(x)$, $t<T$, we can step back all the way to $V_0(x)$. Doing this for every possible $x$ and keeping track of the optimizers will give us a table of optimal actions $a$ for any state $x$ that we may encounter. Is this correct?

Original Q&A

Maximizing expected reward at final time via dynamic programming

Related Questions in OPTIMIZATION

Related Questions in MARKOV-CHAINS

Related Questions in CONTROL-THEORY

Related Questions in OPTIMAL-CONTROL

Related Questions in DYNAMIC-PROGRAMMING

Trending Questions

Popular # Hahtags

Popular Questions