Why is DDPG an off-policy method while policy gradient is by definition on-policy?
DDPG is updated in an off-policy manner while policy gradient is on-policy. So DDPG is not a policy gradient method?
Why is DDPG an off-policy method while policy gradient is by definition on-policy?
DDPG is updated in an off-policy manner while policy gradient is on-policy. So DDPG is not a policy gradient method?
Copyright © 2021 JogjaFile Inc.