Find Markov policy that minimizes(maximizes) the expected discounted cost(reward)

228 Views Asked by Bumbble Comm At 27 Mar 2026 - 2:58

It's an exam problem I found online.Here's a link to the pastpaper.

The problem is stated as follows.

A repairman who services Q facilities moves between location s and location j according to the transition probability p(j|s). An equipment trailer which carries parts may be located at any one of M sites. If the trailer is at site m and the repairman is at facility j, the cost of obtaining material from the trailer is c(m,j). The cost of moving the trailer from site m to j is d(m,j). The decision maker's objective is to dynamically relocate the trailer so as to minimize the expected cost. Assume that the decision maker observes the location of the repairman and trailer, relocates the trailer and then the repairman moves and services a facility. Formulate this as an infnite horizon discounted Markov Decision Process problem. Does there exist a Markovian deterministic policy which is optimal? Justify your answer.

I think it is possible to tackle the problem using the value iteration method. But I am not sure how exactly I can formulate this problem into a Markov decision process.

Any explanation would be appreciated.

Original Q&A

Find Markov policy that minimizes(maximizes) the expected discounted cost(reward)

Related Questions in MARKOV-PROCESS

Related Questions in DYNAMIC-PROGRAMMING

Trending Questions

Popular # Hahtags

Popular Questions