Justification of a Markov decision process with finite states and actions to assume rewards in $[0, 1]$

14 Views Asked by At

For reinforcement learning, consider a Markov decision process: $\mathcal{M} = (\mathcal{S}, \mathcal{A}, P, r, \gamma),$ where the number of states $|\mathcal{S}|$ and actions $|\mathcal{A}|$ is finite. In the following notes on RL Theory, they claim: "there is no loss in generality by assuming that all the rewards belong to the $[0,1]$ interval." Note that the rewards refer to $r$ in $\mathcal{M}.$

What is the justification behind this claim? I had not encountered it before in other sources-- how does the finiteness come into play? From a motivational standpoint, it seems clear, since it makes bounds on the expected discounted return much simpler to express.