How to prove the convergence of an RL algorithm similar to Q-learning

61 Views Asked by At

I am studying reinforcement learning algorithms. I learnt how to prove that the Q-learning algorithm is convergent from [this paper]. But I found a [new paper] where the authors propose a new RL algorithm. Their update formula is different from the traditional Q-learning algorithm. I'm confused about how to prove that their algorithm converges. Should I use the same methods as in the article about Q-learning? Please help me with this problem. Their update formula is as follows: \begin{aligned} Q_{t+1}\left(s_t, a_t\right) & \leftarrow\left(1-\alpha_\tau\right) Q_t\left(s_t, a_t\right) \\ & \qquad+\alpha_\tau {\left[r\left(s_t, a_t\right)+\gamma \hat{V}_t\left(s_{t+1}\right)+b_\tau\right] } \\ \hat{Q}_{t+1}\left(s_t, a_t\right) & \leftarrow \min \left\{\hat{Q}_t\left(s_t, a_t\right), Q_{t+1}\left(s_t, a_t\right)\right\} \\ \hat{V}_{t+1}\left(s_t\right) & \leftarrow \max _{a \in \mathcal{A}} \hat{Q}_{t+1}\left(s_t, a\right) . \end{aligned}