On the bottom of page ten of the following paper on probabilistic reinforcement learning, there are 3 equations where is author manipulates the policy gradient $\nabla_\theta J(\theta)$.
Can someone please explain to me how to derive the last (third) line from the previous (second) line?
I feel like we have to prove either one of those expressions: But I don't know how to go about it.
$$ \nabla_\theta log\ q_\theta (a_t|s_t) (\sum_{t'=t}^T b(s_{t'})) = 0 $$ or $$ E_{(s_t,a_t) ~ q(s_t,a_t)}[\nabla_\theta log\ q_\theta (a_t|s_t) (\sum_{t'=t}^T b(s_{t'}))] = 0 $$
Thanks for reading.