I have a brief mono audio sample (a recording of a single note played on a guitar) that looks like this:
As you can see in the image, the signal is losing intensity over time. My goal is to quantify the rate of decay of this note, i.e. how quickly it is losing intensity. I can do this heuristically, e.g. by taking the ratio between the first and last peak, but this approach is not very robust because it only uses two of the sample points and is thus highly sensitive to random noise.
If possible, I'd like to "divide out" the oscillation component of the signal (probably using a Fourier transform) and then fit an exponential decay equation to the result. This approach would make use of the entire audio sample, and therefore be less sensitive to error in an individual sample point. However, I do not know if this approach is viable, and I don't understand the details of how to carry this out.
My attempt
We know from differential equations that the position of a damped harmonic oscillator is given by (after scaling and translation)
$$x(t) = \exp(-\lambda t) \sin( \omega t),$$
where $\lambda$ is a damping coefficient and $\omega$ is the frequency. So, one idea is to fit the equation above to my input data (e.g. using least squares). But a guitar string doesn't oscillate at just one frequency $\omega$—my input signal is really the composite of several frequencies, which depend on the physical properties of the string and guitar body. To get frequency components from time series data, we normally use a (discrete, in this case) Fourier transform.
I'm stuck because it seems like I need to fit a damped harmonic oscillator model and perform a Fourier transform simultaneously. I know how to do these tasks individually, but not at the same time.
My questions
- Is there a standard way to estimate the decay coefficient from time series data like this?
- Can it done programmatically? (That is, in a way that doesn't require manually labeling peaks, specifying the fundamental frequency, etc.)
I have seen references online to something called RT60 estimation, used to quantify how much sound decays in a room, but this seems like a somewhat different problem: There, the problem is to integrate impulse response data sampled from many different places in the room, whereas I have just a single time series and no notion of place within the room.
