Bellmann Equation loss function optimal Q-Value

99 Views Asked by At

I am currently working on reinfocement learning and there is this Bellman Equation which I need, so I can minimize the loss-Function calculated by my neural Network. When we calculate the loss, we compare the Q-Value that is generated by my neural Network q(s,a) and subtract that from the optimal Q-Value q*(s,a). I dont understand the difference between q* and q, because if we already have the optimal q-Value, then why do I even bother to compute q(s,a)? Or in Q-Learning, where I can look in my Q-Table to get the maxarg(q(s´,a)) to update my table. I dont understand the difference between those two because right now the way I get my q(s,a) is the same as q*(s,a). Help is really appreciated, I googled the whole weekend and couldnt find any solution.

Here is an image that maybe underlines my problem

1

There are 1 best solutions below

0
On BEST ANSWER

Okay I just realized my mistake when I programmed it. Its actually quite obvious. The q-value of q(s,a) should be equal to the calculation we do with the Bellman Equation.

q_optimal = reward + discount_rate * np.max(q_table[new_state, :])
q_now = q_table[state, action] 
loss = q_optimal - q_now

As you can see, q_optimal is calculated differently than q_now, so there are definetely not the same, I didnt get that at first.