How to normalize time series data

45 Views Asked by At

Forgive me if I am using the wrong terminology. I am trying to graph how productive a machine is over time with incomplete data. I am polling the machine at a random interval and getting the total number of parts produced at that moment in time. From that, I get a list of times and part counts like this:

seconds part count
  0          0
 64          2 
145          5 
271          9
282         10
365         12
445         14
511         17
618         20

The above list of data simulates polling (at a random interval) a process that produces a part every 30 seconds. When I graph this, it does not seem to accurately represent the underlying data (should be a straight line). I want to highlight times when the machine is idle or running slowly. Is there a way to approximate when each part was made.

1

There are 1 best solutions below

1
On

You say when you graph it against a straight line, your dataset is not well represented. Let's plot your data and $t/30$ where $t$ is time in seconds.

Mathematica graphics

This is actually pretty good agreement. If we look for the line of best fit, $t = -4.61 + 30.8p$, where $t$ is time to produce $p$ parts, the mean time to produce parts seems to be 30.8 seconds. This fit has $R^2 = 0.997{\dots}$, adjusted $R^2 = 0.996{\dots}$ and $p$-value $3.8\times 10^{-10}$, so we should be utterly astonished if the parts are produced at a linear rate substantially different from the one extracted. Let's see if there is any visible difference in the graphs.

Mathematica graphics

There are a few miniscule differences. But, this isn't the correct model for this process. Parts are not fractionally produced -- each part is finished all at once. The correct model is a step function. Let's look at the data with steps every 30 seconds.

Mathematica graphics

(Each vertical jump corresponds to the time of a part being produced according to a regular schedule. These are at multiples of 30 seconds.)

There is no evidence of slow production (which would be indicated by data points below the graph). There is evidence for fast production, a data point above the graph. If we switch to the model, where the runs starts aboout 4.6 seconds before time zero, the analysis is similar, with more evidence for early production.

Mathematica graphics

(Each vertical jump corresponds to the time of a part being produced according to a regular schedule. These are at multiples of $30.8{\dots} $ seconds, all shifted by $-4.61{\dots}$ seconds.)