Maximizing the summation of reward of some users (waiting in different positions and lines) using reinforcement learning or other learning methods

28 Views Asked by At

There is a mathematical problem that I think can be solved using reinforcement learning and it would be great if you could help me with it.

Some users are standing in some lines. There are N lines. In each line there are M users and M positions (each user has a position).

$U_{i,j} , 1\leq i\leq N, 1\leq j \leq M$ is the user at j-th position in the i-th line.

Each user gains a reward value which is computed using function $f$. The function $f$ depends on some parameters listed in the following.

Reward of $U_{i,j}$ = $f ( j, U_{i,1}, U_{i,2}, …, U_{i,M} )$

The users cannot change their line but can change their position with each other. Each user needs a minimum reward.

The objective is to develop an algorithm that maximizes the summation of the rewards of all users when the reward of each user is more than a predefined minimum required value. The execution of function f needs processing power. The number of states is too large and exhaustive methods cannot be used.

I think reinforcement learning (RL) methods can be used but I do not know how? A little explanation about the approach of RL will be also good. How can we develop a simple RL method which converges in a reasonable number of iterations and time?

Thanks