Maximizing the summation of reward of some users (waiting in different positions and lines) using reinforcement learning or other learning methods

28 Views Asked by Bumbble Comm At 26 Mar 2026 - 9:09

There is a mathematical problem that I think can be solved using reinforcement learning and it would be great if you could help me with it.

Some users are standing in some lines. There are N lines. In each line there are M users and M positions (each user has a position).

$U_{i,j} , 1\leq i\leq N, 1\leq j \leq M$ is the user at j-th position in the i-th line.

Each user gains a reward value which is computed using function $f$. The function $f$ depends on some parameters listed in the following.

Reward of $U_{i,j}$ = $f ( j, U_{i,1}, U_{i,2}, …, U_{i,M} )$

The users cannot change their line but can change their position with each other. Each user needs a minimum reward.

The objective is to develop an algorithm that maximizes the summation of the rewards of all users when the reward of each user is more than a predefined minimum required value. The execution of function f needs processing power. The number of states is too large and exhaustive methods cannot be used.

I think reinforcement learning (RL) methods can be used but I do not know how? A little explanation about the approach of RL will be also good. How can we develop a simple RL method which converges in a reasonable number of iterations and time?

Thanks

Original Q&A

Maximizing the summation of reward of some users (waiting in different positions and lines) using reinforcement learning or other learning methods

Related Questions in OPTIMIZATION

Related Questions in GAME-THEORY

Related Questions in MACHINE-LEARNING

Related Questions in MIXED-INTEGER-PROGRAMMING

Trending Questions

Popular # Hahtags

Popular Questions