What's the fundamental difference between Tabular Q-learning and Q-learning (with off policy TD-control)

115 Views Asked by At

I have two equations.

  1. Q-learning with off policy TD-control :

$$Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha[R_{t+1} + \gamma_{max}Q(S_t, A_t)]$$

  1. Tabular Q-learning:

$$Q(s,a) \leftarrow (1-\alpha)Q(s,a) + \alpha(r + \gamma_{max}Q(s',a')) $$

I don't understand which one is a better fit for a simple environment. Don't they both incorporate TD learning principles? Wouldn't Tabular Q-learning be a better choice for a simple game, such as Tic Tac Toe because it moves at a faster velocity (what I mean is the q-values go up or down drastically in Tabular Q-learning depending on rewards or punishments)

1

There are 1 best solutions below

0
On

Tabular Q-Learning is off-policy TD-Learning.

It is one way to implement Q-Learning when you have enough memories or the state-space is small.

When the state space is huge, we usually use function approximation like linear function approximation or Neural Network. (Tabular can be seen as one type of linear function approximation)