Starting with a sampled audio signal of acapella vocals, I am interested in determining the shift in the tonal center of the music through the performance.
As a choir progresses through a performance of a piece of music, they often tend to drift downwards so that the key they finish in is 1, 2, or more semitones below the key they started at. This "flatting" is not necessarily evenly spaced over the duration of the piece, and often the lion's share of the flatting will occur in a troublesome chord or key change. Identifying these instances enables us to home in more quickly on the problem areas.
At any point in the performance, the waveform will consist of zero to four voices. Because of the nature of the human voice resonators and articulators, each of the four waveforms will undergo spectrum shaping as word sounds are sung and diphthongs are formed, but the fundamental pitch should remain more or less constant.
I'm interested in comments as to how one might measure this pitch shift, presumably using some variant of the FFT.
I believe the problem can be reduced to one of accurately determining the fundamental frequency of each of the four voices. From those frequencies and knowing the key the song is being sung in, it should be possible to work out what the chord is, and therefore what the four fundamental frequencies should be, and hence how much the pitch has dropped. Given singers of some experience, each of the four voices will adapt to the new tonal center as the pitch drops, so the relative tuning between the parts will remain true - i.e. reasonably well-tuned chords will still be produced, just at progressively lower keys.