Situtation:
An oil-company collects waste oil from small car/truck workshops. The workshops have only limited waste oil storage varying per workshop.
Oil must be collected from those workshops when their storage is almost at its full capacity. At the current moment an oil collection task is created either by 1) the workshops that places a phone order or 2) the oil-company drives by on their own initiative but without any knowledge of the amounts to expect at the workshop.
The problems:
- Oil-collecting company arrives too late. Storage capacity already reached.
- Oil-collecting company arrives way too early and thus can only collect a small amount of oil.
What do we have?
Collection data for the past 3 years for each workshop. Dates and the volumes collected.
Preferred solution:
Arriving in time when the customers storage (by our calculation) reached 80% of the total capacity. Arrive right before the workshop would place a phone order to collect its waste oil.
Questions:
Which mathematical approach is the best to predict the date on which the next collection should take place? Which Python3 libraries can help me solving this problem?
Some demo data: Total capacity is 2400L
{
"26-01-2015":740,
"09-02-2015":1380,
"27-02-2015":700,
"31-03-2015":820,
"22-04-2015":860,
"11-05-2015":880,
"01-06-2015":960,
"25-06-2015":980,
"27-07-2015":940,
"14-08-2015":1000,
"04-09-2015":1420,
"23-09-2015":1060,
"13-10-2015":1260,
"02-11-2015":940,
"24-11-2015":780,
"15-12-2015":1100,
"05-01-2016":1100,
"01-02-2016":1280,
"25-02-2016":1020,
"16-03-2016":560,
"13-04-2016":1320,
"03-05-2016":1160,
"27-05-2016":1420,
"20-06-2016":1100,
"11-07-2016":900,
"27-07-2016":800,
"22-08-2016":1120,
"12-09-2016":780,
"11-10-2016":940,
"26-10-2016":2000,
"17-11-2016":880,
"05-12-2016":1080,
"22-12-2016":740,
"10-01-2017":780,
"30-01-2017":860,
"20-02-2017":1200,
"14-03-2017":1100,
"06-04-2017":1080,
"25-04-2017":900,
"18-05-2017":800,
"13-06-2017":820,
"05-07-2017":1600,
"27-07-2017":720,
"16-08-2017":1000,
"04-09-2017":710,
"02-10-2017":1020,
"20-10-2017":2400,
"25-10-2017":800,
"16-11-2017":560,
"08-12-2017":660,
"03-01-2018":920,
"25-01-2018":700,
"26-02-2018":1540
}
The provided data is not sufficient to fully answer this question. If the data is a full representation of one station over three years, I would suggest using linear regession models to predict average increse in storage each day. You will need to change the data to increase over the periode of time and don't forget to set some points where waste was collected (you should not have any decrease of waste in your data for you model training).
One point to consider is, it sounds as if the stations are independent of each other, meaning you can fit one model for each station seperable.
SVM's could work well as well, for NN this seems to be to sparse data. All needed methods are implemented in scikit and pandas.