DDPG policy gradient using DeepLearning4J

247 Views Asked by Bumbble Comm At 31 Mar 2026 - 8:41

I have a question about the policy gradient update in the deep deterministic policy gradient algorithm. I am implementing the DDPG algorithm in Java using the DeepLearning4J library.

In the algorithm the following update is used:

$$\nabla_{\theta^{\mu}}J \approx \frac{1}{N}\sum_i\nabla_aQ(s, a | \theta^Q)|_{s=s_i,a=\mu(s_i)}\nabla_{\theta^{\mu}}\mu(s|\theta^{\mu})|_{s_i}$$

Can this be rewritten to:

$$\nabla_{\theta^{\mu}}J \approx \frac{1}{N}\nabla_{\theta^{\mu}}\left(\sum_i\nabla_aQ(s, a | \theta^Q)|_{s=s_i,a=\mu(s_i)}\mu(s|\theta^{\mu})|_{s_i}\right)$$

I then want to this in the following way. Use $\nabla_aQ(s, a | \theta^Q)|_{s=s_i,a=\mu(s_i)}$ as error term for the back propagation algorithm. $\mu$ does not have a loss function, so the calculation of $\delta$ for the last layer is just:

$$\delta^{(n)} = f'(z^{(n)}),$$ with $n$ the final layer.

The backpropGradient function in the DeepLearning4J library takes $\epsilon$ as input which is multiplied with $f'(z^{(n)})$. So if $\epsilon$ is replaced with $\nabla_aQ(s, a | \theta^Q)|_{s=s_i,a=\mu(s_i)}$ it should give me the correct gradient right?

$$\delta^{(n)} = \nabla_aQ(s, a | \theta^Q)|_{s=s_i,a=\mu(s_i)}f'(z^{(n)}),$$ with $n$ the final layer.

And then use this term to compute the rest of the back propagation algorithm.

Original Q&A

DDPG policy gradient using DeepLearning4J

Related Questions in MACHINE-LEARNING

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions