sound FFT data to PCM data

197 Views Asked by At

I have only a quite naive understanding of FFT. Is my naive interpretation about how to recalculate the time series data (PCM data) back from FFT data correct? If not, what's wrong with it?

I will describe my understanding on a concrete example:

I got some FFT data with dimension 257 produced from a sound, where I have a frame for every 10 ms. I'm not exactly sure how it was calculated. So, e.g. for a sound of 1.21 secs, my FFT data is $$FFT \in \mathcal{R}^{112 \times 257} .$$

I want to reproduce the original PCM data so I can play the sound.

I think the first dimension might be something else, so I'm left with 256 dimensions, as I guess this is a common dimension for FFT?

I thought that I could calculate the PCM data for a sample rate of 44.1 Hz like:

$$ PCM_p = \sum_{i=1}^{257} v_{p,i} \cdot \sin(t_p \cdot f_{i-1} \cdot 2 \cdot \pi) $$ where $p$ is the frame, $t_p$ the time, $$ t_p = \frac{p}{44100} , $$ $v_{p,i}$ the amplitude which I take directly from $FFT$, which I linearly interpolate from the current frame $j_p$ and the last frame $j_p-1$ (assuming $j_p > 0$), i.e. $$ j_p = \lfloor t \cdot 100 \rfloor, $$ $$ \lambda_p = t \cdot 100 - j_p \in [0,1] , $$ $$ v_{p,i} = FFT_{j_p,i} \cdot \lambda_p + FFT_{j_p-1,i} \cdot (1 - \lambda_p) , $$ and $f_i$ is the frequency, which I assume to be in $[F_{start},F_{end}]$, i.e. $$ f_i = F_{start} + i \cdot \frac{F_{end} - F_{start}}{256} . $$

This is all very naive. Also, I'm not sure about $f_i$. What are common values?