Policy gradient base line function

24 Views Asked by Bumbble Comm At 27 Mar 2026 - 9:43

On the bottom of page ten of the following paper on probabilistic reinforcement learning, there are 3 equations where is author manipulates the policy gradient $\nabla_\theta J(\theta)$.

Can someone please explain to me how to derive the last (third) line from the previous (second) line?

I feel like we have to prove either one of those expressions: But I don't know how to go about it.

$$ \nabla_\theta log\ q_\theta (a_t|s_t) (\sum_{t'=t}^T b(s_{t'})) = 0 $$ or $$ E_{(s_t,a_t) ~ q(s_t,a_t)}[\nabla_\theta log\ q_\theta (a_t|s_t) (\sum_{t'=t}^T b(s_{t'}))] = 0 $$

Thanks for reading.

Original Q&A

Policy gradient base line function

Related Questions in CALCULUS

Related Questions in PROBABILITY

Related Questions in DERIVATIVES

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions