Mathematical intuition why the iterative Bellman update converges to the optimal solution

35 Views Asked by At

I know that the mathematical justification for using the Bellman-equation iteratively to find the optimal policy in Reinforcement Learning is based on convergence results. I wonder however whether there is a mathematical intuition for this convergence. In one of his lectures on Youtube, David Silver mentioned that this is some sort of contraction argument and I wondered whether this can be made intuitive.