It's an exam problem I found online.Here's a link to the pastpaper.
The problem is stated as follows.
A repairman who services Q facilities moves between location s and location j according to the transition probability p(j|s). An equipment trailer which carries parts may be located at any one of M sites. If the trailer is at site m and the repairman is at facility j, the cost of obtaining material from the trailer is c(m,j). The cost of moving the trailer from site m to j is d(m,j). The decision maker's objective is to dynamically relocate the trailer so as to minimize the expected cost. Assume that the decision maker observes the location of the repairman and trailer, relocates the trailer and then the repairman moves and services a facility. Formulate this as an infnite horizon discounted Markov Decision Process problem. Does there exist a Markovian deterministic policy which is optimal? Justify your answer.
I think it is possible to tackle the problem using the value iteration method. But I am not sure how exactly I can formulate this problem into a Markov decision process.
Any explanation would be appreciated.