Log probability of following a trajectory under an optimal policy

14 Views Asked by Bumbble Comm At 11 Apr 2026 - 11:14

In reinforcement learning, is the log probability of following a trajectory under an optimal policy equal to the sum of rewards for that trajectory? i.e.

$\log(p(\tau)) = \sum^T_{t=1}r(s_t,a_t)$

I've seen this stated in this blog post: https://dibyaghosh.com/blog/probability/kldivergence.html ("We know that the probability of a trajectory under optimality is exponential in the sum of rewards received on the trajectory. $\log(p(\tau)) = \sum^T_{t=1}r(s_t,a_t)$")

If so, why? It feels like there might be a connection to information theory, since this would mean that minimising the reward maximises information (minimises $-\log(p(\tau))$). However I can't think of a way to prove to myself that this would be the case and I'm struggling to find supporting references.

Thank you for your help!

Original Q&A

Log probability of following a trajectory under an optimal policy

Related Questions in MACHINE-LEARNING

Related Questions in HIDDEN-MARKOV-MODELS

Trending Questions

Popular # Hahtags

Popular Questions