Can Machine Learning models be considered as "Approximate Dynamic Programming"?

172 Views Asked by At

In the context of certain statistical/machine learning models, such as models that are trying to estimate "optimal policies" (e.g. reinforcement learning) - can we consider these models as "approximate dynamic programming"?

My understanding is that the principle of "Bellman Optimality" states that in optimization problems in which "optimal substructure" is believed to exist* - Dynamic Programming in theory is able to provide a globally optimal solution by breaking the original problem into multiple smaller problems and solve the original problem by solving all the smaller problems. I am not sure if I understood this correctly - but Bellman Optimality also states that for problems with Optimal Substructure, Dynamic Programming can work recursively with "limited lookahead", implying that Optimal Solutions can be found without spending "too much time" moving backwards/forwards and reconsidering policies at previous steps.

However, Dynamic Programming in the classical sense is unable to work effectively in larger problems - therefore, a "Value Function" is created to estimate the approximate value of different states, and then Dynamic Programming can be used to approximate a globally optimal solution. Dynamic Programming in this approximate form (i.e. Approximate Dynamic Programming) is implemented through a Statistical/Machine Learning model.

Is my understanding of this correct - can certain Statistical/Machine Learning Models be considered as Approximate Dynamic Programming?

Thanks!

*Note: I am told that there is no real way to determine if an optimization problem indeed has "optimal substructure" - we often assume that they do and solve them under this assumption. This results in solutions in real world problems being locally optimal instead of globally optimal.

Note : I think that the main difference between Greedy Search and Dynamic Programming is that Greedy Search returns an optimal solution only at each subproblem and reaches the final solution by joining all these optimal solution together - this only results in a globally optimal solution in some problems. Dynamic Programming tries to calculate optimal solutions at each time point by considering/projecting the effect of these solutions at future time points, and can reconsider previously chosen solutions. In short, Dynamic Programming returns optimal solutions on all problems that Greedy Search returns optimal solutions - but there are some problems in which Greedy Search will not return an Optimal Solution but Dynamic Programming will return an Optimal Solution.

1

There are 1 best solutions below

0
On

Is my understanding of this correct - can certain Statistical/Machine Learning Models be considered as Approximate Dynamic Programming?

I believe there may be some conceptual issues in your question.

  • A model is an estimation $\hat{f}$ of some unknown function $f$. For example, for a two-dimensional problem, we try to estimate a variable $y$ given an input vector $x$ by assuming a functional relationship $f(x)=y+ϵ$, where $ϵ$ is the part of $y$ that is not explained by $x$. An example of this is Linear Regression, which produces a model $\hat{Y}$ that aims at approximating the unknown but assumed linear function that relates two variables.

  • Approximate dynamic programming is a technique that tries to solve large scale stochastic control processes, i.e., processes that consist of a state set $S$, with the system being at a particular state $S_t$ at time $t$ from which we can make a certain decision $x_t$ out of a set $X$. The decision results in rewards or costs and brings about a new state (so that every state is conditionally dependent on the previous one). The main concern of this technique is to find a decision function $X^{\pi}(S_t)$, which denotes the policy $\pi$, that yields the best decisions.

As you can see, the two concepts are very different. The first denotes a mathematical concept, while the second is a technique to solve a problem. So when you ask if certain statistical or ML learning models can be considered approximate dynamic programming, the short answer is no.

Now, if your question was whether approximate dynamic programming is (or can be) used in certain ML algorithms, the answer is yes. Because the core idea of dynamic programming is to minimize costs in decision making processes, it is widely used in reinforcement learning and self-learning in general. It is not limited to this fields, though.

As I was writing this it occurred to me that it should be plausible to solve clustering problems using dynamic programming. When I googled to find information on this I found, for example, this article proposing an optimal version of the $k$-Means algorithm using DP.