Optimization of Dynamic Warehouse Delivery

Question

Optimization of Dynamic Warehouse Delivery

77 Views Asked by Bumbble Comm At 29 Mar 2026 - 6:04

I have an optimization problem for the delivery of boxes between warehouse and production lines within a small facility. I need to determine how many transport vehicles and utilities to buy, such that cost and downtime is minimized (downtime can be considered a cost as well). But I don't know how to best approach this: integer programming, reinforcement learning, simulation software, other??

There is one warehouse (assume infinite stock). There are multiple production lines, $n \in N = \{1, 2, ...\}$. At specific times of $t \in T = \{0, 1, 2, ..., 28800\}$ (set of time indices in seconds, horizon is 8 hours), a line can put in an order for boxes from the warehouse. The boxes contain components which each line uses up at a fixed rate. $r_{mnt}$ is the amount of component $m$ remaining at line $n$ at time $t$. When $r_{mnt} = 0$ for any $m$ at line $n$, then line $n$ stops (does not use any other components) until $r_{mnt} > 0$ (order arrives). $d_{nt} = 1$ if line $n$ is down during $t$, $0$ otherwise. $c_n$ is a constant cost-per-second for line $n$ being down.

When a line orders, an available worker in the warehouse will immediately take a fixed amount of time to load all boxes of that order all at once onto a utility (something that holds boxes, like a wagon). The worker is then immediately free to serve the next loading request (or remain idle). The loaded utility then waits for a vehicle to come and pick it up to deliver to the line. After dropping off the utility at the line, the vehicle is then immediately free to serve the next transport request (or return back to its home in the warehouse). At the line, there is always a worker who is available, who will take a fixed amount of time to unload all the boxes from the utility. The empty utility then waits for a vehicle to come and pick it up to take back to the warehouse.

The unknowns/decision variables are:
$v$, number of vehicles to buy. One vehicle has a one-time cost of $c_v$.
$u$, number of utilities to buy. One utility has one-time cost $c_u$.
$w$, number of workers in the warehouse. Each worker has one-time cost $c_w$.

The objective function is to minimize total cost (vehicles + utilities + warehouse workers + downtime): $C_{total} = c_v v + c_u u + c_w w + \sum_n\sum_t c_n d_{nt}$

This problem is about optimally allocating resources (workers, vehicles, and utilities) over time. But the availability of these resources changes very "dynamically" and has interdependencies. A warehouse worker and an empty utility must both be available in the warehouse to load an order. This depends on when a worker will become available, and when an empty utility will be brought back. Utilities must wait for a vehicle to come and pickup. This depends on when a vehicle will become available and where it currently is, as the amount of time to travel to the utility can vary depending on current location ($s_n$ is the fixed time to travel between warehouse and line $n$). Then after pickup, it will also take $s_n$ time for the vehicle to travel to the utility's destination. All of this can affect timeliness of deliveries, thus if/when $d_{nt} = $ 1 or 0. This also affects when / what a line will order at its next ordering point. Furthermore, it can be possible for a vehicle to end up dedicated to a line (if the boxes are used up very quickly), so the priority rule of the utility pickup may not necessarily be as simple as the "next available" or "closest" vehicle.

Would integer programming be the right approach for this? Feels like this is too complicated for integer programming, as actions/states can affect future actions/states.
This makes me think of reinforcement learning. But wouldn't selecting $v, u, w$ just be a one-time action at the start of an episode, not something that is explored/exploited over the course of an episode?? How would the states and actions be framed for reinforcement learning?
The last approach I can think of is simulation software, which seems to be the "simplest". I'd program the system behaviour, change $v, u, w$ for each simulation run, then compare the results of each run. But this is a very trial-and-error approach, not necessarily optimal...

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Assuming that none of the parameters (costs, item volumes, processing times) are random, simulation would not be appropriate. A metaheuristic would act somewhat like a simulation (particularly a metaheuristic that included randomness in its decision making); it would build a candidate solution, evaluate it, change things (likely somewhat randomly), reevaluate and repeat. It would also bear some resemblance to reinforcement learning, although what was "learned" would apply only to your specific problem instance and would not generalize.

That said, I would try a mixed integer linear program first, particularly if you could get by with a coarser unit of time than seconds (thus reducing the cardinality of $T$ and consequently the number of variables and constraints). MIP models for scheduling problems where states are dependent on past decisions are fairly common. Complexity of the problem statement is not the main issue. Problem size / dimensions might be a concern, depending on the tightness of the model and the available software and computational resources. If I tried a MIP model and could not get acceptable results, I might then switch to a metaheuristic.

Optimization of Dynamic Warehouse Delivery

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in MACHINE-LEARNING

Related Questions in INTEGER-PROGRAMMING

Related Questions in OPERATIONS-RESEARCH

Trending Questions

Popular # Hahtags

Popular Questions