I was reading about the "Bellman Principle of Optimality" (https://en.wikipedia.org/wiki/Bellman_equation) :
It seems that the "Bellman Principle of Optimality" state that for some problem, "the overall optimal policy" can be considered as the "sum of optimal policies at each time step".
I was concerned about the following point : many times in computer science, if we were to break some problem into two sub-problems, the "optimal solution" for both of these sub-problems might not result in the "optimal solution" for the original problem. Taking the "optimal solution" for a sub-problem without considering impact of this solution for the entire problem is often described as a "Greedy Search".
Thus, in the "Bellman Principle of Optimality" - are we avoiding this "Greedy Search" problem? Are we somehow ensuring that the "sum of all locally optimal solutions will necessarily result in a globally optimal solution"?
It seems as if the "Bellman Principle of Optimality" is more of a "general framework" and does not provide any tangible methods of determining these optimal policies - my guess is that determining these optimal policies for real-world problems is often done through some Machine Learning algorithm? (note: I think that this is the intended use of "Dynamic Programming" - Dynamic Programming might be intended for finding these optimal policies.)
Thanks!
