Trouble Understanding Notation in Reinforcement Learning Paper

164 Views Asked by At

I'm looking at this (warning: this is a download of a pdf) paper and am having trouble parsing the notation on top of page 11, steps 4.1 and 4.2.

$\forall i \leq t \in T$, $\forall$ $x_i$, $a_i$ Update all Q-Values according to their eligibility traces

$Q_t^{k+1}$($x_i$, $a_i$) $\leftarrow$ $Q_i^{k}$($x_i$, $a_i$) + $\alpha$($x_i^k$,$a_i^k$)$\delta_t^k$$e_t^k$$(x_i,a_i)$

Specifically, I'm having trouble telling what the i is all about in step 4.1. *i and t seemed to be used interchangeably, but I'm sure that's not actually what's going on. Any help would be greatly appreciated.

2

There are 2 best solutions below

0
On

Read $\forall$ as "for all" or "for each" and $\in$ as "in" or "belongs to".

0
On

I think your confusion is because you have two loops:

foreach t ranging from 1 to m
   ...
   foreach  i ranging from 1 to t
       update Q values, etc...
   ...

Your algorithm is using eligibility traces.

You sample a trajectory of maximum $m$ steps and the current trajectory length is $t$. On every step you revisit every previous state from 1 to $t$, and you use $i$ to track that.