Reducing big amount of data into smaller

34 Views Asked by At

I have a sensor that can make thousands of measurements per second, it sends all this measurements ( packed in a single message ) using a messaging protocol over internet to my service, there I need to translate all the data ( it comes in form of bytes ) to human readable value.

The measurements are from different kind of sensors, but I know that a big amount of measurements can sometimes represent so small changes from each other and no spikes at all. Would be a waste of resources to store and analyze all these data. I could reduce $~2000~$ measurements that looks like this:

$$~2331.14 ,~ 2331.13 ,~ 2331.11,~ 2331.12 ,~ 2331.12,~ 2339.52 , ~2339.40, ~4216.51$$

to something like this : $~2331.12, ~2339.52 ,~ 4216.51$

The problem

I can't simply put all this non reduced data into my database, I do pre and post analyzes to this data and would be a waste of resources.

The solution

Reduce the set of data without loosing too much accuracy is the biggest challenge, I thought of doing a "current and previous diff", like :

$x_1~$ is the first measurement, save it anyway

$x_1~$ diff $~x_2 > 0.15~$ ? no, $~x_1~$ is already saved so just forget about $~x_2$

$x_1~$ diff $~x_3 > 0.15~$ ? no, $~x_1~$ is already saved so just forget about $~x_3$

$x_1~$ diff $~x_4 > 0.15~$ ? yes, $~x_1~$ is already saved so just save $~x_4$

$x_4~$ diff $~x_5 > 0.15~$ ? no, $~x_4~$ is already saved so just forget about $~x_5$

But that would drastically reduce the accuracy of the data, as there is the chance to **lose spikes of measurements that slightly increase between each other.

Is there any algorithm or already known method that I can use to archive what I want?