How to determine a trendline given a set of values, which represent changes of a particular metric over time?

744 Views Asked by At

I would like to calculate a trendline. I have these metrics - minimum, maximum, average, median, mode, range, each point comes with a time that measurement was taken. Is that data sufficient for calculating a trend or do I need some other metrics?
My initial though is to apply some modification of 'the longest sequence of consecutive natural successors' to find the start and the end point of the trend.
That piece of data comes from a financial domain.
My math background is not a strong one, so if you can point me to some articles to read on this topic - I'll be grateful.
Here's my data sample (17 points, average metric used, date is represented as a number of milliseconds since January 1st, 1970):

 data:[71.911509,71.928993,71.907168,71.947290,71.915951,71.925802,71.930942,71.894793,71.796717,71.723729,71.675488,71.693242,71.690964,71.687208,71.692363,71.683531,71.642414], 

 timestamps:[1449158406112,1449159306393,1449160207362,1449161108128,1449162006643,1449162905018,1449163805674,1449164704940,1449165604534,1449166504159,1449167404237,1449168304924,1449169204815,1449170104565,1449171003940,1449171903956,1449172804378]
1

There are 1 best solutions below

6
On BEST ANSWER

This question is somewhat vague, but then I suspect you are just getting started with your project. Without knowing more specifics of your data it is hard to say exactly where you should start. Here are some very generic suggestions; I hope some of them are useful in helping you think through this project.

Begin by picking your favorite 'metric' to use for a predicted variable, and plot it against time. If the plot looks roughly linear, use simple linear regression to see if you can find a regression line through the data that satisfies your definition of 'trendline'. If you have no obvious favorite, start with means or medians.

It is possible that many of the metrics you mention will behave similarly. If your purpose is to predict what comes next, pick the metric you'd most want to predict. Your ability to predict far into the future will likely be very limited, especially if there are changes in the underlying economic, social, and other underlying conditions for which you do not have data.

If the data suggest that a simple curve fits better than a line, try using various transformations of your metric (predicted variable). If metrics take only positive values, perhaps logs or square roots.

Another choice would be to try to fit a quadratic or cubic curve. You'd do that by having 'predictor' variables $t$ and $t^2$ (for quadratic, add $t^3$ for cubic).

If you notice cycles (periodic up and down trends) in your metric when plotted against time, then you need to explore some of the methods of time-series analysis.

Addendum, I looked some plots of your data. There seems to be a 'change point' about halfway along. There are tests for this, but time series is no an area in which I feel comfortable giving detailed advice. I'm wondering if all your data sequences show this kind of behavior, or whether you showed this particular one because of its unusual structure. Also, I wonder if detecting the location of a change point (beyond what is obvious from the plots) is useful for you.

enter image description here

A 'control chart' of individual data points shows that the process changes from 'out of control' above the 'upper control limit' (UCL), which is 2SD above the mean, to out of control below the LCL. (Characteristically for control charts, SD is estimated using the range of the data). Red dots indicate out of control and 1s indicate the reason (out of a standard numbered list) for so tagging the value.

enter image description here

A 'runs' chart (not posted) shows a notable sequence of runs 'above the median' followed by a similar sequence 'below'.

There is nothing profound in either chart beyond what is clear from the initial unembellished 'time series' plot of 'data' against (essentially equally spaced) 'timestamps'. I would start by Googling some of the key terms and looking in an introductory applied book on time series. Posting this problem on our sister site 'stats;crosvalidated' might get you advice for a time series expert.