Why do bellman error gradients become big?

49 Views Asked by At

I read these notes on deep q learning (DQN) and it said that Bellman error gradients can become pretty big. While watching the lecture video for that slide, the speaker basically said that we are using the gradient of a squared bellman error and the derivative of a quadratic can become a large value. I am a little lost why that is the case? DQN is basically performing regression and we don't seem to worry about regression gradients being large.

Here's the typical loss function of a typical DQN -

$$L(\theta) = 1/2(R_{t+1} + \gamma[\![ \max_{a}q_{\theta}(S_{t+1,a})]\!] - q_{\theta}(S_{t}, A_t))^2$$

Here, $R_{t+1}, S_{t}, A_t$ denote the reward, state and action at the current time step. $S_{t+1}$ denotes the next state and $q_{\theta}$ the q values. Also, we do not take the gradient of the term within $[\![ ]\!]$ as this is not a true gradient method but semi-gradient method.