Immediate reward in a continuous time Markov decision process

20 Views Asked by At

I don't have much background in MDPs but I was reading a text in which I came across the following which I am finding difficult to understand. Consider a Markov decision process with state space $S$ where the only action possible for state $s=0$ is $a=0$ and the reward on completing the action is $r_a$ and the task time is exponentially distributed with rate $\mu_a$. Then as per the model in question(the details of which I have not included here) there exists a finite uniformization constant $q$ for the CTMC. Then the text claims that the
the immediate reward obtained when action $a=0$ is chosen in state $s=0$ is $$r(0,0)=r_a\frac{\mu_a}{q}$$ I dont understand how this is obtained. I know what is uniformization of a CTMC but I don't understand how to relate it to the reward. How can the reward be less than $r_a$? Any help will be appreciated.