Amount of sound near a specific frequency at a specific time

317 Views Asked by At

I have a sound signal, sampled at 48000 hz. Now I want to know 'how much sound' there is of a specific frequency (or near that frequency) at a specific time. For example, I want to know at 10s how much sound there is near the frequencies $20 \cdot 2^{k/3}$ hz with $k=0,1, \dots, 30$.

I'm trying to find a connection between a sound someone is hearing and an fMRI scan of his brain while he is listening to that sound. This website says that sounds with different frequencies are processed in different areas of the brain, that's why I need to know this.

When doing a bit of research I quickly arrived at short-time Fourier transform (STFT). I don't need an inversible transform, but I can use some of the principles. Let's assume that the signal is continuous. If I understand it right, then I can use the formula $$ y(t, f) = \left| \int_{-\infty}^{\infty} x(\tau)\ w(\tau-t)\ e^{-2 \pi i f \tau} \ \mathrm{d}\tau \right|, $$ where $x(t)$ is the signal and $w(t)$ a window function. I read a Gaussian window function is common: $$ w(t) = e^{-\frac12 (t/\sigma)^2}, \quad \sigma > 0 $$ Should I use this window function or some other window function? If I should use this one, what value should $\sigma$ have, should it be fixed or dependent on $f$? It seems intuitive to me that the width of the window should be proportional to $1/f$, but no one seems to be doing this. Can anyone tell why? Is the first formula the way to go or is there a better way to do what I want?

I'm also thinking whether I want 'near a frequency' at all, maybe I want 'between two frequencies' instead, or maybe something else. I understand you cannot guess what I want exactly, but I hope someone has some ideas anyway.

1

There are 1 best solutions below

5
On

The problem has several variables.

You'll need a window (in time) around the time where you want to make the determination. Windows of 125 msec are common in many audio applications.

Once you've selected your sample window, you can choose a window function (you've suggested a Gaussian window). Window functions are mainly used to compensate for spectral artifacts generated by the discrete Fourier transform. A Hamming-type window function is generally a good "middle of the road" window function with reasonable sensitivity for frequencies over the range of magnitudes you're looking at.

The third question is how close to the specific frequencies do you want to measure the energy. Since this experiment is targeting human hearing, you might want to look up the smallest variation in frequency that the human brain can distinguish. Or you might want to make it another independent variable in the experiment.

Given a selection of a sample window length, a window function, and a bandwidth (i.e. the margin of frequencies you consider to be "near" your center frequency), you can then run a discrete Fourier transform (DFT) on your sampled, windowed data. You can then sum the squared amplitudes of all of the spectral components inside your selected bands to obtain the total energy "near" the center frequencies.

Of course, when in doubt, dig into the literature and see what other people in the field are doing. ;)