Linear regression of time series data - moving linear regression

136 Views Asked by At

Situation

A suitable analogy for my real-world problem would be a shop - customers arrive, spend a random amount of time in the shop and leave. The arrival behaviour of customers follows a Poisson distribution in that the inter-arrival times are random and independent of one another.

Whenever a customer arrives and whenever a customer leaves, the current throughput (customers leaving per second) and number of customers currently in the shop are recorded.

Problem

I want to be able to plot these pairs of values of throughput (Y axis) and the number of customers (X axis) on a virtual plane and find the line of best fit through the points to determine the gradient.

However, as this is time series data and over time more and more points will be plotted, I need some way of giving greater importance to more recent points, something like what is done in the rolling average.

Question

The idea I had was to sample departure events in fixed time windows to determine the throughput. After each sampling window, a value of the throughput and number of customers in the shop at the instant the window ends would be plotted on a virtual plane. Standard linear regression techniques (e.g. least squares) could then be deployed to determine the gradient of the line of best fit.

However, as this is time series data, there must be a way to limit the number of points plotted on the graph or lower the significance of points exponentially with time, like a rolling average. What is the best way to achieve this?

Secondly, some smoothing will no-doubt be required to stabilise the result. How could this be done?