Need help for solving the Finite Horizon Problem

43 Views Asked by Bumbble Comm At 07 Apr 2026 - 3:16

I have been trying to solve the following set of equations for a while. There are a few things that I understand and some I don't. I would really appreciate if someone help me out. Thank you! $$\theta_{k+1} = \theta_{k} + \alpha \nabla_\theta J(\pi_\theta)|_{\theta_{k}}$$ $$\nabla_\theta J(\pi_\theta) = \nabla_\theta E_{\tau=\pi_{\theta}}[R(\tau)]$$ $$P(\tau|\theta) = \rho_{0}(S_0)\prod\limits_{t=0}^T P(S_{t+1}|S_t, a_t)\pi_\theta P(a_{t}|S_t)$$

The first equation indicates that it is a smooth and fine control Policy Gradient. So, if we substitute $\pi_\theta = \theta_k$ we get

$$\theta_{k+1} = \theta_{k} + \alpha \nabla_\theta J(\theta_{k})$$

Now, from equation 2, we see the objective is to find expectation (E) $$\nabla_\theta J(\pi_\theta) = \nabla_\theta\int P(\tau|\theta)R(\tau)d\tau$$ $$\nabla_\theta J(\pi_\theta) = \int \nabla_\theta P(\tau|\theta)R(\tau)d\tau$$ $$\because \nabla_\theta \log P(\tau|\theta)= \frac{\nabla_\theta P(\tau|\theta)}{P(\tau|\theta)}$$

We can then manipulate the final equation as,

$$P(\tau|\theta) = \rho_{0}(S_0)\prod\limits_{t=0}^T P(S_{t+1}|S_t, a_t)\pi_\theta P(a_{t}|S_t)$$ $$\log P(\tau|\theta) = \log\rho_{0}(S_0)+\sum\limits_{t=0}^T \log P(S_{t+1}|S_t, a_t)+ \log\pi_\theta P(a_{t}|S_t)$$ $$\nabla_\theta\log P(\tau|\theta) = \nabla_\theta\log\rho_{0}(S_0)+\sum\limits_{t=0}^T \big(\nabla_\theta\log P(S_{t+1}|S_t, a_t)+ \nabla_\theta\log\pi_\theta P(a_{t}|S_t)\big)$$ $$\frac{\nabla_\theta P(\tau|\theta)}{P(\tau|\theta)}= \nabla_\theta\log\rho_{0}(S_0)+\sum\limits_{t=0}^T \big(\nabla_\theta\log P(S_{t+1}|S_t, a_t)+ \nabla_\theta\log\pi_\theta P(a_{t}|S_t)\big)$$

It's as far as I was able to manage. Is it the right solution or is there anything further I have to do? I'm kind of stuck at this point. Any help is appreciated. Thank you!

Original Q&A

Need help for solving the Finite Horizon Problem

Related Questions in CALCULUS

Related Questions in STATISTICS

Related Questions in MARKOV-PROCESS

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions