Please forgive the question if it sounds trivial/naive, I am from computer science background, not electrical/computer engineering.
I work with GPS trajectory dataset for classification. Data was collected at sampling rate of 1-second per sample.
time-domain:
Currently, I considered working with time-domain features only, so I extracted two point-level features:
speed:(rate of distance between consecutive locations with time), andacceleration:(rate of speed with time).
I then divided each user's trip into fixed-length segments of 100 time steps (100 data points per segment). I finally calculated 11 statistics of each feature (min, max, mean, median, st-dev, etc) in the fixed-length segment, making a total of 22 features per segment. This I used to train a classifier (decision tree) for classification.
frequency-domain:
But I wanted to try combination of time-and-frequency features in building my model, to see the impact, so using fast Fourier transform, I proceeded in the following way:
- Take the original 2-featured segments (the 100 fixed data point segments).
- Use
scipy.fft.fftto obtain the corresponding frequency spectrum of each feature, thus:
The figure below illustrates an example input in time and frequency domains.import numpy as np from scipy.fft import fft N = 100 # number of samples in a segment T = 1 # sample spacing ( 1 second) def fft_transform(data): """ compute the FFT of each segment's feature in features's dimension """ fourier = fft(data, axis=2) magnitude = np.abs(fourier) return magnitude - Take only the positive frequencies values of the
FFT(sinceFFTof real-valued input is symmetric). - Add the
frequency-domainvalues of each feature to itstime-domain. The result is11 time-domainfeatures forspeed(similarlyacceleration) and50 frequncy-domainfeatures forspeed(similarlyacceleration). Total:122 featuresper input.
The new results obtained yielded significant negative impact on the overall model performance, evaluated on the usual metrics (precision, recall, f1).
In the next round of experiment, I would like to consider significant FFT descriptors to add as frequency features instead of using the whole FFT values as explained above. I initially thought of computing same feature statistics as I did for the time-domain, but was advised that wouldn't make a good features for FFT spectrum (that doing so may also lead to discard of important features/pattern).
I can testify to that, as the results I obtained doing so even worsen the model performance. Some classes never get correctly predicted even for 1 sample.
That said, I would like to have your advise regarding:
- What constitute a good
FFTdescriptors to select and add totime-domaininstead the similar statistics as used for time features? - Any pointer to the code (python), for your suggestion considering my script above?
