How to show that randomized policy cannot be optimal for an MDP

125 Views Asked by At

I am working on Markov Decision Process and try to show that randomized policy cannot be optimal for a MDP. Is there anyway to prove that if there is an optimal policy for a MDP, then it must be deterministic?

Thank you