Detecting increasing pulse trains

209 Views Asked by At

I have a one dimensional point process representing the times of events which is also mixed in with lots of data that I regard as noise. The interval between the events in the point process are geometrically increasing.

The total number of points is of order one million. I would like to detect (approximate) processes of this form in the data. What is a good method for doing this?

As an example, if my points are 1, 7, 8, 10, 16, 25, 28, 30, 31,50, 52 I would like to detect 7, 10, 16, 28, 52. As my values are reals and come from measurement I need this to tolerate small errors in the data.

As another example, points in such a pulse train will be of the form $t_0 + \sum_{i=1}^n a*k^{i-1}$. In the example above, $t_0 =7$, $a=3$, $k=2$.

2

There are 2 best solutions below

6
On

You might try looking for quartets $(a,b,c,d)$ where $a$ and $d$ are not too far apart, and $(a-b)(c-d)-(b-c)^2$ is relatively small. Then see whether the quartet extends to a fifth, and so on. Will you ever miss the occasional term from the dataset?

0
On

This answer is incomplete, but it's where I would start if I were assigned this problem. We know the signal we are looking for:

$$ x(t) = \delta( t - t_0 ) \ast \sum_{n=0}^{\lfloor \log(N)/\log(p) \rfloor - 1} \delta \left( t - (p^{n}-1)k \right) $$

where $\ast$ is convolution, $N$ is the number of samples, $p,k\in\mathbb{Z}_{+}$, and $t_0$ is the start of the geometric pulse train. For the example in the OP, these values are $t_0 = 7$, $k=3$, and $p=2$, which gives impulse locations at $7+\{0,3,9,21,45,\dots\}$.

The $\delta(t-t_0)$ term in the above represents an unknown shift in the sequence, so we will ignore for it now. The Fourier transform of the rest of the above is easy to find since it's just the sum of complex exponentials:

$$ X(\omega,p,k) = \sum_{n=0}^{\lfloor \log(N)/\log(p) \rfloor - 1} e^{-j\omega (p^{n}-1)k } $$

This lets us easily create a bank of matched filters. Of course, this means you have to set bounds on the domain of the signal, but since it's geometric in nature, there are are not that many reasonable values (e.g., if $p=15$, then there will only be $5$ non-zero values out of $10^6$ and in all likelihood the signal will be below the noise floor).

So, for a given $p,k$ pair, generate the following sequence:

$$ z(t) = \operatorname{FFT}^{-1} \left( \operatorname{FFT}(x) \cdot X^{\ast}(\omega,p,k) \right) $$

If $p,k$ match the values in the signal, then $z(t)$ will have a peak at $t_0$.

I think the tricky part is going to be the detection, but maybe not since you should be able to figure out the expected amplitude of the matched filter since every impulse has unit amplitude. However, several input sequences will look nearly identical and in the presence of noise may be difficult to distinguish. For instance, the example you gave of $t_0=7$, $p=2$, and $k=3$ is nearly the same as $t_0=10$, $p=2$, and $k=6$.

Below is some Matlab code that might help you get started. Like I said, this answer is incomplete, so keep that in mind.

N = 1e6;
x = zeros(N,1);

% The geometric pulse trains we are trying to detect.
% We give each an arbitrary starting point.
x( 63 + ( 5.^( 0 : log(N) / log(5) - 1 ) - 1 ) * 2 ) = 1;
x( 7 + ( 2.^( 0 : log(N) / log(2) - 1 ) - 1 ) * 3 ) = 1;

% Give the data some noise.
randinds = randperm(N);      % Random permutation for noise.
x(randinds(1:10e4)) = 1;     % Add some noise points.

Nfft = 2*N;
X = fftshift( fft( x, Nfft ) );    % DFT of input data.
f = (-0.5:1/Nfft:0.5-1/Nfft)';     % Normalized frequency axis.

% Define a matched filter bank.
p = ( 2 : 10 );
k = ( 1 : 10 );
Nmf = length( p );
K = length( k );
mf = complex( zeros( N, K, Nmf ) );

% Now we create and apply each matched filter.
for ii = 1 : Nmf
  for jj = 1 : K

    % Make the matched filter for this geometric factor.
    Q = 0;
    for n = 0 : floor( log(N) / log(p(ii)) ) - 1
      Q = Q + exp( -1j*2*pi*f * ( p(ii)^n - 1 ) * k(jj) );
    end

    % Now apply the matched filter and store the result.
    tmp = ifft( ifftshift( X .* conj(Q/norm(Q)) ) );
    mf(:,jj,ii) = tmp(1:N);

  end

end

% Create the test statistic by taking the max value of each
% (k,p) pair.
T = squeeze( max( abs( mf ), [], 1 ) );