I have been thinking about the problem of generating cycles from a given node, on a strongly connected graph. The goal is to generate cycles that are good, with respect to an objective function $f$.
The obvious approach is to generate all the cycles from the given node and calculate the objective function for each one of them. It's not sufficient to calculate the objective function once, because $f$ is not pure and therefore the result does not depend only on the cycle. However, if a cycle was good in the past, there's good chance it will be good in the future (at least in the short-term, but I digress).
I was wondering if the solution can be made better if we already have a set of good cycles. I thought that we can interpret the graph as a Markov chain if we add edge weights that keep track of how many times the edge was traversed (from the set of already good cycles). Then, from a node, we can choose the next one probabilistically using the weights. The downside of this approach is that we're not guaranteed to obtain cycles in a reasonable time or with a reasonable length (very long cycles are very likely not good).
Perhaps the latter formulation renders the problem similar to a Markov Decision Process (MDP) problem, but I'm not that familiar with that topic, so here I am asking for references. I don't even know if my problem has a specific name to be honest. Have problems like this one already been studied somewhere? My own research has been quite unsuccessful.