Given a feasible solution to a linear program, how to find a better solution?

33 Views Asked by At

Background: I have been applying (theoretically) the method in this paper to find a feasible solution a linear program, when the constraints of the linear program are unknown.

This is in the context of reinforcement learning, specifically constrained Markov decision processes. (A constrained Markov decision process can be expressed as a linear program.)

The gist of the method in the linked paper is that if you have a linear program defined by matrix $A$ and vector $b$, as in something like $Ax \geq b$, then we apply the ellipsoid method to search for $(A, b)$. In the method, we have access to an oracle that we can send a proposed solution to, and the oracle either tells us we have found a feasible solution to $Ax \geq b$, or it tells us which constrained we violated the most.

That is the basic idea, although there are many complications that I will not describe here.

Question: Now that I have a (theoretical) method to find a feasible solution, I am interested in finding a better, or even optimal, solution.

In order to facilitate this, it seems obvious that we have to assume that we have access to the objective function of the linear program, say $c^\intercal x$, in order to compare two feasible solutions $x_1$ and $x_2$, and determine which one is better.

I will suggest some possible approaches below.

  1. We could simply run the method in the linked paper again, get another feasible solution, and then compare the two feasible solutions using the objective function. However, this seems extremely inefficient.

  2. We could compute the gradient of the objective function, $\nabla c^\intercal x = c$, and then try to do some kind of gradient ascent/descent starting from an initial feasible solution $x_1$, so something like $x_2 = x_1 + \eta c$. However, we have no assurance that we will remain within the feasible region when stepping in the direction of the gradient, so this doesn't seem to work.

  3. Could we use the feedback from the oracle to do some kind of projected gradient ascent/descent? I don't think I would get enough information from the oracle to do a projection.

  4. Given a starting feasible solution $x_1$, could we do some kind of greedy search of an $\epsilon$ ball centered at $x_1$, and then step in whichever direction seems best? This seems like it would only be guaranteed to work if the initial feasible solution $x_1$ is in the interior of the convex set to begin with.

I appreciate any ideas.