Using Q-Learning to find best mixture of algorithms.

42 Views Asked by Bumbble Comm At 31 Mar 2026 - 5:47

I'm working on an optimization procedure where one out of N strategies (actions) is run at each time step, leading to a decrease (reward) in a global optimization function. Exactly which strategy works best depends on many factors, like the problem at hand, the solver parameters, and the current state of the solver across time.

I implemented a simple Q-learning selection method for determining which of the strategies should be used at any given time. I have two questions about this:

It seems the actions are not independent. With this I mean that applying one strategy (action) at a given time t might influence another strategy reward at time >t. Does this break the assumptions for using Q-Learning?
For choosing the best action I normalize the vector of Q values to [0,1] and do a probabilistic selection based on this vector. I also tried two methods for the action selection:

one where the best action is selected with probability p or choosen randomly with probability 1-p, and
boltzmann

However, they don't seem to work as well (and requires more parameters). Does my method makes any sense? What could be its shortcomings? I am concerned that it might be working by luck, which might change with other inputs.

Original Q&A

Using Q-Learning to find best mixture of algorithms.

Related Questions in MACHINE-LEARNING

Related Questions in MARKOV-PROCESS

Trending Questions

Popular # Hahtags

Popular Questions