What if the reward of an action in MDP(Markov Decision Process) is not immediately known.

42 Views Asked by Bumbble Comm At 26 Mar 2026 - 8:41

We know that in MDP the reward of an action is required to be immediately known. What if the reward of an action in MDP(Markov Decision Process) is dependent on the later state.

For example, consider a queue containing $m$ servers with exponential service rate $\mu_i, i = 1,\dots,m$, which faces customers follow Poisson arrival with parameter $\lambda$. As a system manager, I need to decide which server let the arrival customer join so that minimize the total average waiting time. Here, the waiting time of a customer will not be known immediately after I make the 'dispatching' decision.

Is this question can be formulated as a MDP? I have found some papers which discussed the similar questions but the objective is not quite same. For example, some of them consider minimizing the average length of queue, or maximize the total completed amounts.

Original Q&A

What if the reward of an action in MDP(Markov Decision Process) is not immediately known.

Related Questions in MARKOV-PROCESS

Related Questions in QUEUEING-THEORY

Related Questions in DYNAMIC-PROGRAMMING

Trending Questions

Popular # Hahtags

Popular Questions