Explanation of strategies in infinite horizon dynamic programming problem

125 Views Asked by Bumbble Comm At 27 Mar 2026 - 2:02

My question is regarding the Bellman equation regarding strategy $\sigma^{(1)}$ on the last 2 lines (I have attached pictures of the book below). If we know that all future states will have value of 0, since we have spent a=s in period 1, why do we need to keep this part $\delta W\sqrt{\sigma^{(1)}(s-a)} = \delta \sqrt{(s-a)}$ in the expression?

Here is the reading relating to the question ---

page 1 of question

page 2 of question

Original Q&A

There are 1 best solutions below

Bumbble Comm On 09 Apr 2015 - 4:02

This is policy iteration solution algorithm in which we iterate on the policy (strategy) function $\sigma^{(j)}$. The initial guess is $\sigma^{(0)}(s)=0$. The next iteration policy $\sigma^{(1)}$ is found by solving the optimization problem in the Bellman equation using the continuation value resulting from $\sigma^{(0)}$, that is $W(\sigma^{(0)})(s)=0$. Similarly, on the next iteration, $\sigma^{(2)}$ policy is found by soling the Bellman equation with the continuation value $W(\sigma^{(1)})(s)=\sqrt{s}$. Note however that in Bellman equation $W(\sigma^{(1)})$ is calculated at $s-a$.

Explanation of strategies in infinite horizon dynamic programming problem

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in DYNAMIC-PROGRAMMING

Trending Questions

Popular # Hahtags

Popular Questions