Markov Decision Process - Optimal policy invariance to scaling in the Utility Function

90 Views Asked by At

The title says it all. If i use a discounted Utility Function, why is the optimal policy invariant with respect tot the scaling of the Utility Function by a positive Factor?

1

There are 1 best solutions below

0
On

Let $u$ be your utility function and $\alpha>0$, and $\delta\in(0,1)$ the discount factor.

Let's start with deterministic processes. You want to have a process $(x_n^*)$ that gives higher discounted utility than all other paths $(x_n)$: $$\sum_n \delta^n u(x_n^*)\geq\sum_n \delta^n u(x_n^*).$$ This is clearly equivalent to $$\alpha\sum_n \delta^n u(x_n^*)\geq\alpha\sum_n \delta^n u(x_n^*)$$ $$\sum_n \delta^n \alpha u(x_n^*)\geq\sum_n \delta^n \alpha u(x_n^*).$$ So the optimal policy does not change. Since expectation is again linear, a similar argument can be made for the stochastic case.