I am currently implementing a multivariate time series pattern recognition algorithm from "Pattern Recognition and Classification for Multivariate Time Series" (Spiegel et al. 2011). It starts by using SVD to divide the time series into segments (of non-equal length) at those points, where the correlation structure of the variables seems to change. So far so good, I already succeeded by coding the first part.
The second part concerns the clustering of the obtained segments and I can't manage to get the SVD similarity measure right (will be used afterwards in agglomerative hierarchical clustering). From the paper: Classification Description. As far as I understand, the SVD similarity measure should yield a value between 0 and 1, but the values I obtain don't make much sense since they are quite large and also negative. Let's say I want to compare the two segments i and j, denoted by matrices $X_i$ & $X_j$ (variables in columns, each row is a point in time/observation):
- I start by mean centering both segments
- Then I calculate the SVD for each segment, using k dimensions: $U_{i,k}\Sigma_{i,k}V_{i,k}^T$ as well as $U_{j,k}\Sigma_{j,k}V_{j,k}^T$
- According to Spiegel et. al 2011:
Then the similarity between these subspaces can be defined as the sum of the squares of the cosines of the angles between each vector of matrix Wi,k and Wj,k, which are composed by multiplying the respective singular values and right-singular vectors ($W_{i,k} = \Sigma_{i,k}V_{i,k}$ and $W_{j,k} = \Sigma_{j,k}V_{j,k}$). This similarity measure is formalized by the following equation: $$S_{SVD} = \frac{1}{k} trace((W^T_i W_j)(W^T_j W_i))$$
First of all, I think the expression in the brackets (in text) regarding $W_{i,k}$ and $W_{j,k}$ is wrong. If we reduced the dimensionality to k dimensions for a segment $X$ that got dimensions n x m, the matrix $\Sigma$ would have dimensions k x k and $V^T$ would have dimensions k x m. Therefore, I perform $W_{i,k} = \Sigma_{i,k}V_{i,k}^T$ instead of $W_{i,k} = \Sigma_{i,k}V_{i,k}$. Otherwise I would have a dimensionality mismatch.
- After obtaining $W$ as described before, I stick to his definition of $S_{SVD}$ and multiply the (transposed) $W$ matrices of the two segments, take the trace of the result and divide it by the number of dimensions.
Here is my Python3 code (runs in a loop and fetches segments from list of all segments at beginning):
# fetch segment data that gets analyzed and mean-center it
first_segment = segmented_signals[i]
first_segment = first_segment - first_segment.mean(axis=0)
second_segment = segmented_signals[j]
second_segment = second_segment - second_segment.mean(axis=0)
# decompose segments via SVD
first_svd = np.linalg.svd(first_segment, full_matrices=False)
second_svd = np.linalg.svd(second_segment, full_matrices=False)
# extract necessary matrices of decomposition
first_svd_sigma = np.diag(first_svd[1])[0:svd_dims, 0:svd_dims]
first_svd_vh = first_svd[2][0:svd_dims, :]
second_svd_sigma = np.diag(second_svd[1])[0:svd_dims, 0:svd_dims]
second_svd_vh = second_svd[2][0:svd_dims, :]
# scaled rsv matrices
first_rsv_matrix = first_svd_sigma @ first_svd_vh
second_rsv_matrix = second_svd_sigma @ second_svd_vh
similarity = (1/svd_dims) * np.trace((first_rsv_matrix.T @ second_rsv_matrix)
@ (second_rsv_matrix.T @ first_rsv_matrix))
As I said, I seem to obtain arbitrary numbers for the similarity measure, often very high and also negative, for example the first segment with itself results in -4.21861211e+03, while I would think it should yield 1 to be interpretable as the highest level of similarity.
I am very thankful for any help and hope the questions is understandable & provides the necessary information, if not don't hesitate to ask for more detail. Here is a link to the paper: https://www.researchgate.net/profile/Stephan_Spiegel/publication/254003630_Pattern_recognition_and_classification_for_multivariate_time_series/links/0f31753a929a3d72ef000000.pdf
Greetings & Thanks in advance