In reinforcement learning, is the log probability of following a trajectory under an optimal policy equal to the sum of rewards for that trajectory? i.e.
$\log(p(\tau)) = \sum^T_{t=1}r(s_t,a_t)$
I've seen this stated in this blog post: https://dibyaghosh.com/blog/probability/kldivergence.html ("We know that the probability of a trajectory under optimality is exponential in the sum of rewards received on the trajectory. $\log(p(\tau)) = \sum^T_{t=1}r(s_t,a_t)$")
If so, why? It feels like there might be a connection to information theory, since this would mean that minimising the reward maximises information (minimises $-\log(p(\tau))$). However I can't think of a way to prove to myself that this would be the case and I'm struggling to find supporting references.
Thank you for your help!