I was reading this article about Discounted return (in the context of MDP): http://deeplizard.com/learn/video/a-SnJtmBtyA
I got the section:
Now, check out this relationship below showing how returns at
successive time steps are related to each other.
We’ll make use of this relationship later.
[please use the image url (below) if it doesn't appear here][1]
[1]: https://i.stack.imgur.com/KgOML.png
The extract shows 3 maths formulas. I have 3 questions on this:
1- I noticed that Rt + 3 is occurring twice, in the time steps. Could this be a typo error? I.e. shouldn't the next time step be Rt + 4? Or is this correct? If so, then it doesn't make sense to me.
2- I didn't understand how in the third formula at the bottom, we changed it to yGt+1?
3- Why would the discount y be increased exponentially (i.e. its power increases by one) with every time step? Doesn't that seem like a dramatic increase in the discount with every time step (as opposed to maybe multiply the the discount by an increasing co-efficient that is equal to the time step value)?
Many thanks in advance for any help.