I am working on Markov Decision Process and try to show that randomized policy cannot be optimal for a MDP. Is there anyway to prove that if there is an optimal policy for a MDP, then it must be deterministic?
Thank you
I am working on Markov Decision Process and try to show that randomized policy cannot be optimal for a MDP. Is there anyway to prove that if there is an optimal policy for a MDP, then it must be deterministic?
Thank you
Copyright © 2021 JogjaFile Inc.