How should I calculate a rolling autocorrelation?

3k Views Asked by At

I have an array of data $ \mathbf{y} \in \mathbb{R}^n $, and I need to calculate the lag-1 autocorrelation between sections of this array 7 elements long.

For all intents and purposes, we can imagine reshaping this array into a $ n/7 \times 7$ matrix, and then taking the autocorrelation between each row of the matrix.

Here is my gripe: I'm unsure how to compute the AC. On one hand I could use the following formula

$$ r = \dfrac{ \sum_{j=1}^{7} (\mathbf{y}_{i,j}- \bar{\mathbf{y}})(\mathbf{y}_{i+1,j}- \bar{\mathbf{y}}) }{\left(\sum_{i=1}^{7}(\mathbf{y_{i,j} - \bar{\mathbf{y}}})^2\right)^{0.5} \left( \sum_{i=1}^{7}(\mathbf{y_{i+1,j} - \bar{\mathbf{y}}})^2 \right)^{0.5} } $$

Here, $ \bar{\mathbf{y}}$ is the mean of the entire vector, and not the row. I do this because each row from the matrix comes from the same signal. This would require me to write my own function, which is not out of the realm of possibility, I just trust built in functions from stat toolkits a little more than my own understanding.

The other option is to use a built in function, but I am unsure if the function will calculate the mean of each row and use that in the calculation. Like I've said before, since each row comes from the same signal, the mean used in the calculation should be the mean of the entire signal and not just the row.

Thoughts are appreciated on the issue. I'm writing this in python if that is salient.

So how should I go about computing this?

1

There are 1 best solutions below

2
On BEST ANSWER

The rolling autocorrelation can be done like this in python

pandas.rolling_apply(your_data['column'], 7, lambda x: pandas.Series(x).autocorr(1))

The rolling_apply() will split your data every 7th row and run the autocorr() function on it with the lag of choice (1 in this case).

The above applies only for pandas v0.18 or lower. In the latest API, this is now done like this:

your_data['column'].rolling(7).apply(lambda x: pandas.Series(x).autocorr(1))