How to determine notable change in a sequence of numbers?

390 Views Asked by At

I have a stream of numbers coming in at random intervals (measurements, if you will). Neighboring values are typically be very similar with very small differences, but from time to time, values will suddenly spike upwards or downwards or jump/fall to a new equilibrium and remain there. I'm trying to figure out a formula to detect those sudden changes or spikes.

Initially, I was thinking about using arithmetic mean of last n values and a fixed percentage difference from that mean to detect changes (and ignore small differences). However average doesn't change quite as fast as I would have wanted, even if I weigh it, so now I'm thinking of using standard deviation (or three) to detect outliers, because:

In a normal distribution, 68% of the observations fall within one standard deviation. 95% of the observations fall within two standard deviations. 99.7% of the observations fall within three standard deviations.

This is much better, but even when I use three standard deviations, there will be some outliers which fall outside of 99.7% but which are not significant as they don't have big enough difference from the standard to actually be outliers.

Is there a better measure which would allow me to detect sudden changes, but ignore small variations? The key requirement is that I don't want to have any constants in calculation; this has to work for situation when average difference between the values is 100 as well when it's 0.01. In the former, only outlier with at least, say, 150 difference from the mean is significant, while in the latter, the outlier with only 0.02 difference from the mean is important and should be detected.

1

There are 1 best solutions below

0
On

Did you try checking if a new number $a_n $ falls in the range $[(1-p)a_{n-1}, (1+p)a_{n-1}]$ with $p$ some small number like $0.05$ for example?

That would check if the new number was fairly close to the last one you had read. Does it work for you? If you do not wish to let it change to much, you could also check if it lie in the ranges $[(1-p)^2a_{n-2}, (1+p)^2a_{n-2}], \cdots, [(1-p)^ia_{n-i}, (1+p)^ia_{n-i}]$. You could then fine tune your criterion to find the better $p $ and the number $i $ of tests you would make.

Does this cut the trick for you? Do you need any additional properties?