How can I prove probability of the trajectory under policy $\pi$ in reinforcement learning

28 Views Asked by At

enter image description here

I can prove $P(A_t, S_{t+1}, ... , S_T|S_t$~$\pi) = \Pi^{T-1}_{k=t}\pi(A_k|S_k)p(S_{k+1}|S_k,A_k)$
but I can't prove $P(A_t, S_{t+1}, ... , S_T|S_t,A_{t:T-1}$~$\pi) = \Pi^{T-1}_{k=t}\pi(A_k|S_k)p(S_{k+1}|S_k,A_k)$

assume $T = t+1$ then $P(A_t, S_{t+1}|S_t,A_t) =\pi(A_t|S_t)p(S_{t+1}|S_t,A_t)$
I think $\pi(A_t|S_t)p(S_{t+1}|S_t,A_t) = p(A_t, S_{t+1}|S_t)$
then $P(A_t, S_{t+1}|S_t,A_t) = p(A_t, S_{t+1}|S_t)$ is this right?

I'm so confusing please help me...