Although the question is about scaling servers up and down, the solution is a mathematical formula based on heuristics, which is why I allowed myself to ask for help in here. Sorry in advance if I am asking in the wrong place. I am also unsure which tag to place this under.
Situation:
There are some tasks that need to be executed by a server. The ratio should preferably be 1:1; one server per task. Once the task is finished, the server shuts down. The server supplier offers "low-priority" servers which are cheaper than dedicated servers, but does not tell how many low-priority servers are available at a given time. The ideal scenario would be that we in the formula determine the number of needed servers, and spin of as many of these as low-prio. If not all servers can be spun up as low-prio, the rest should spin up as dedicated servers.
The available metrics to define the formula are available here:
https://docs.microsoft.com/en-us/azure/batch/batch-automatic-scaling#metrics
Thoughts + additional information:
The maximum number of servers is 1000.
The formula is re-run every 5 minutes, and samples from the past are available in the formula. Sample data includes:
In the sample data, an average can be calculated as such:
averageActiveTasks = avg($ActiveTasks.GetSample(TimeInterval_Minute * 5))
Say that there are 200 active tasks when running the formula for the first time. This means that 200 servers are needed. The number of available low-prio servers is unknown, so we can only guess, either optimistically or conservatively.
To keep it conservative, let's ask for 50 low-prio and 150 dedicated.
The formula runs 5 minutes later, and we will have data available on how many low-prio servers were running.
Case:
We recieved 20 low-prio servers, 30 less than what we asked for. We recieved 150 dedicated servers, exactly what we asked for. Total: 170, so we need to spin up 30 servers for the remaining tasks.
How many low priority vs dedicated should we ask for next time? And what if there are 200 new tasks 15 minutes later?
An answer with a reference a book, article or something relevant in regards to this kind of issue would also be acceptable.
