Correct Understanding of Bellman Optimality

104 Views Asked by At

I was reading about the "Bellman Principle of Optimality" (https://en.wikipedia.org/wiki/Bellman_equation) :

enter image description here

It seems that the "Bellman Principle of Optimality" state that for some problem, "the overall optimal policy" can be considered as the "sum of optimal policies at each time step".

I was concerned about the following point : many times in computer science, if we were to break some problem into two sub-problems, the "optimal solution" for both of these sub-problems might not result in the "optimal solution" for the original problem. Taking the "optimal solution" for a sub-problem without considering impact of this solution for the entire problem is often described as a "Greedy Search".

  • Thus, in the "Bellman Principle of Optimality" - are we avoiding this "Greedy Search" problem? Are we somehow ensuring that the "sum of all locally optimal solutions will necessarily result in a globally optimal solution"?

  • It seems as if the "Bellman Principle of Optimality" is more of a "general framework" and does not provide any tangible methods of determining these optimal policies - my guess is that determining these optimal policies for real-world problems is often done through some Machine Learning algorithm? (note: I think that this is the intended use of "Dynamic Programming" - Dynamic Programming might be intended for finding these optimal policies.)

Thanks!