Calculate z-score/sd/mean of certain time frame and use z-score for peak events to see change points

401 Views Asked by At

I have a variable Var. You can imagine the variable as kind of a time series where some observations with high values indicate an event (e.g., at point t5 the value goes up to 20, while from to to t4 it is at about 5).

Now I'm thinking about of calculating the standard deviation and mean the instances that fall before the event-instances and calculate the z score for the whole data set based on this sd and mean. My reason why I want do so: I want to capture the "normal" standard deviation and mean and assume that the drastic change in z score can help me to see the event at the beginning. However, I think this is very basic and I'm wondering if you might know research on this topic or whether this approach has a special name. I can't find something on google scholar but I'm pretty sure one did this before. Maybe it is even an established technique?

Do you know more about that?

Best!

2

There are 2 best solutions below

2
On

Don't know if I understood your question correctly, though. I think you are looking for what they call 'anomaly detection'. There are several ways to do this and one basic approach is indeed put an upper- and under limit, based on your average, the standard deviation and then look for some upper and under limit like: $ average+3*standard deviation$ & $ average-3*standard deviation$. This includes incidental high points in your time series. However, a more general approach is to look for the underlying distribution of your data and not just rely on a normal distribution. I recently used machine learning algorithms to do this, so that could be an option.

If you want to know if your pattern is changing from for example a stationary pattern to a trend, you might be better off with Trigg's tracking singal (https://en.wikipedia.org/wiki/Tracking_signal).

A very good book on time series forecasting and thus handling such anomalies is 'forecasting: methods & applications' by Makridakis, Wheelwright & Hyndman: https://robjhyndman.com/forecasting/

0
On

There are any number of approaches to this problem. I would tend to think that, while machine learning could certainly work for you, it might be overkill. Why not use a simple threshold indicator? When your signal goes above (a rising edge, we'll say) a threshold, the event has happened. Peak detectors might be of value.

The $z$ score is just going to re-scale the problem, not solve the problem. In addition, you write that you want to calculate the mean and standard deviation of the instances that happen before the event; the problem here is that you're trying to detect those events! It sounds circular to me.

Here's another approach that can be very powerful (and actually is a hidden machine-learning approach): fit a low-order polynomial to chunks of your data and subtract the result from the original data. Because linear regression fitting is a sum, it will not give much weight to single points, with the result that they will be much easier to detect. This approach is especially useful if your baseline is non-constant.