Markov Decision Process question

35 Views Asked by At

I'm trying to work through the lecture notes on MDP by Kallenberg (https://www.math.leidenuniv.nl/~kallenberg/Lecture-notes-MDP.pdf), and run into this question:

Blockquote Suppose you have an employee and at the beginning of each month you can decide on his salary for that month: either a low salary (\$ 2300) or a high salary (\$ 3000). Knowing his salary, the employee can decide to send in his resignation immediately. The probability that he sends in his resignation depends on his salary: 40% for a low salary and 20% for a high salary. When the employee quits, a temporary employee has to be hired immediately for \$ 4000 per month. When you have a temporary employee you will advertise each month for a new permanent employee. The probability to find a new permanent employee (who can start at the beginning of the following month and will receive the same salary conditions as he original employee) depends on the advertising budget: 70% for advertising budget \$ 300 and 90% for advertising budget \$ 600. Each month you have to decide which salary is offered to an employee and if the employee resigns you have to choose the advertising budget. What is for you an optimal policy if only the next six months are considered?

I'm not sure how to formulate this as an MDP, as the employee can resign immediately, therefore within a month period, the state could transition from Permanent to Temporary and back to Permanent, at the beginning of time $t+1$. How should I approach this?