How to derive the policy gradient for finite-difference methods used for policy search?

96 Views Asked by At

According to https://spiral.imperial.ac.uk/bitstream/10044/1/12051/4/2300000021-Deisenroth-Vol2-ROB-021_published.pdf,

enter image description here

enter image description here

Firstly, I cannot see how $\nabla J_\theta$ can be derived from the perturbations. Using Taylor Series, I can only see that the term given at the end is $\nabla R$. Is it that $J_\theta = \nabla R$? If so, why? If not, how can I derive this result?