In our system, we are running jobs which are queued with a distribution determined by analyzing empirical data, sample below:
| Time between consecutive jobs (s) | How many times this happened |
|---|---|
| 0 | 257 |
| 1 | 374 |
| 2 | 127 |
| 3 | 85 |
| 4 | 73 |
| 5 | 66 |
| 6 | 65 |
| 7 | 63 |
| 8 | 73 |
| 9 | 52 |
| 10 | 60 |
| ... | ... |
This is just an excerpt because the time between consecutive jobs goes up to into the millions, like weeks between jobs!
If we know how long a job takes, say within a narrow margin always 600 seconds, is the entire data set, including all data points, enough to determine how many jobs might be expected to be running concurrently? Ideally with a standard deviation?