How is this reinforcement learning value formula read / understood?

21 Views Asked by At

$$V_\pi(s) = E[R_t|s_t=s,\pi]$$

This is a value function for state s under policy $\pi$ where $R_t$ is the return value, all of which occurs at time t. I was wondering how I should read/ understand this, since from what I understand the E means expected value, but I have no clue how the inside of the brackets should be read.

1

There are 1 best solutions below

0
On BEST ANSWER

The right hand side of the equation appears to read

The expected value of the return at time $t$, assuming that the policy is $\pi$ and the state at time $t$ is $s$.

The pipe in set notation means "such that", although here it reads more naturally as "assuming". I don't love the comma notation, but many authors use it in the set-builder context to mean the logical AND. So the literal reading would be "$R_t$ such that $s_t=s$ AND $\pi$"; then I tried to parse that based on your description of the left-hand side.