I am learning about Dynamic Time Warping algorithm. It is capable of finding the optimal warping path (a mapping with some extra conditions) between two time series $x_1, ... x_N$ and $y_1, ... y_M$. Instead of using DTW algorithm one would iterate over all possible warping paths and pick the one with the lowest sum of local distances, but it would lead to an unacceptable time complexity which is, according to this paper "exponential in lengths N and M".
I'd like to know why such time complexity.
Well, think about it this way. For every pair of indices $(i, j)$ you have to make the choice whether or not to include it in your path. So, how many such pairs are there in total? And if you have 2 options for each of those pairs, how many choices did you make in the end?