Markov Decision Process - Optimal policy invariance to scaling in the Utility Function

93 Views Asked by Bumbble Comm At 11 May 2026 - 9:28

The title says it all. If i use a discounted Utility Function, why is the optimal policy invariant with respect tot the scaling of the Utility Function by a positive Factor?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 18 Jan 2014 - 12:05

Let $u$ be your utility function and $\alpha>0$, and $\delta\in(0,1)$ the discount factor.

Let's start with deterministic processes. You want to have a process $(x_n^*)$ that gives higher discounted utility than all other paths $(x_n)$: $$\sum_n \delta^n u(x_n^*)\geq\sum_n \delta^n u(x_n^*).$$ This is clearly equivalent to $$\alpha\sum_n \delta^n u(x_n^*)\geq\alpha\sum_n \delta^n u(x_n^*)$$ $$\sum_n \delta^n \alpha u(x_n^*)\geq\sum_n \delta^n \alpha u(x_n^*).$$ So the optimal policy does not change. Since expectation is again linear, a similar argument can be made for the stochastic case.

Markov Decision Process - Optimal policy invariance to scaling in the Utility Function

There are 1 best solutions below

Related Questions in MARKOV-PROCESS

Related Questions in ARTIFICIAL-INTELLIGENCE

Related Questions in UTILITY

Trending Questions

Popular # Hahtags

Popular Questions