Implementing Statistical Pitchmark Correction Method formula in programming code

49 Views Asked by At

I am trying to follow along a paper. This paper explains how automatically estimated pitchmarks (certain time periods in human speech) can be corrected if they are obviously wrong.

They write:

3.2. Statistical Pitchmark Correction Method

We observed also another problem which is referred to voice characteristic used in recording of ATR database. The voice is prone to pitchmark position errors, which is critical for speech concatenation using pitch synchronous methods (i.e. PS-OLA). Advanced pitchmark labeling algorithms produces up to 10 errors per 5 seconds of each utterance (error rate is dependent of voice characteristic and utterance as well). These errors mostly occure when an algorithm locates more or less glottal closure instants in given part of speech signal then in reality. Then we can finally observe:

  1. missing pitchmark in glottal closure instant,
  2. multiple pitchmarks near glottal closure instant. Having based on that observation we decided to implement simple method to correct these errors. We called it Statistical Pitchmark Correction Method. First we need to prepare vector V (contains one value per pitchmark) using formula:

enter image description here

enter image description here

where: ¢ti - pitch period duration of pitchmark i; ¢ti = ti+1 − ti

N - stands for window length 2 * N

Vi stands for pitch period duration of pitchmark i referenced to average pitch period duration (for window size 2 * N). Voice frequency doesn’t change dramatically in regular speech, so the value Vi should change smoothly in time and oscillate near 1. Basing on it we are able to detect pitchmark problems. Value of Vi close or bigger than 2 means there probably is a missing pitchmark, Vi value close to 0 means there probably are multiplied pitchmarks. Having used this simple method in Ivona Speech Synthesis we reduced concatenation errors and gained ”smoother” sounding speech.

Could anybody tell me how to convert this formula to a programming language like C#?

I am fighting with other challenges in that approach, and I would not like to make a mistake at implementing this formula. I am really not sure if I understand it correctly.

And I don't understand how I can correct the found errors using this formula.

Thank you for any help.