What is suboptimality gap in reinforcement learning?

581 Views Asked by At

I was reading some research papers on Reinforcement Learning Theory, and I constantly encountered a term called the suboptimality gap. As I searched the internet, I couldn't find any information about this term. So, I wonder whether anyone here knows what this means?

1

There are 1 best solutions below

0
On BEST ANSWER

In Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs, suboptimality gap associate with action $a$ at state $x$ is defined to be

$$gap_\infty(x,a)=V^{\pi^*}(x)-Q^{\pi^*}(x,a),$$

It is the difference in the value of a particular action from a particular state as compared to the optimal move.

Similar term has been used in bandit problem as well.