I'm reading about audio processing but I'm not very well versed in the mathematical notation used in research papers. In this particular paper, $\mathit X = \{x_n\} ⊂ ℝ^d, n = 1, . . . , N$ represents a sequence of observations(frames/samples) from an audio signal.
“Let $X = \{x_n\} ⊂ ℝ^d, n = 1, . . . , N$ be the observation sequence extracted from the audio signal, where n is a frame index. Let L be the number of past observations, we define a dynamic descriptor sequence $Y = \{y_n\} ⊂ ℝ^α, n = L + 1, . . . , N$ as $y_n = D(x_n−L, x_n−L+1, . . . , x_n)$ where D is a smoothing function that takes into account L observations and α is the new dimension after the transform.”
I understand that X is a SET of audio frames, with the sequence of frames beginning at 1 and ending at N.
What I don't understand is why $\{x_n\}$ is written as a subset of the set of real numbers complement d. Not sure how to translate that intuitively. What's the implication for a set consisting of a series of audio frames?
"α is the new dimension after the transform"
I can infer from this that d is also a dimension.