Why is $\mathit X = \{x_n\} ⊂ ℝ^d, n = 1, . . . , N$ used to represent a sequence of audio frames?

30 Views Asked by At

I'm reading about audio processing but I'm not very well versed in the mathematical notation used in research papers. In this particular paper, $\mathit X = \{x_n\} ⊂ ℝ^d, n = 1, . . . , N$ represents a sequence of observations(frames/samples) from an audio signal.

Let $X = \{x_n\} ⊂ ℝ^d, n = 1, . . . , N$ be the observation sequence extracted from the audio signal, where n is a frame index. Let L be the number of past observations, we define a dynamic descriptor sequence $Y = \{y_n\} ⊂ ℝ^α, n = L + 1, . . . , N$ as $y_n = D(x_n−L, x_n−L+1, . . . , x_n)$ where D is a smoothing function that takes into account L observations and α is the new dimension after the transform.”

I understand that X is a SET of audio frames, with the sequence of frames beginning at 1 and ending at N.

What I don't understand is why $\{x_n\}$ is written as a subset of the set of real numbers complement d. Not sure how to translate that intuitively. What's the implication for a set consisting of a series of audio frames?

"α is the new dimension after the transform"

I can infer from this that d is also a dimension.