How does this equation relate to a Gaussian distribution and what does the comma mean?

48 Views Asked by At

Before the following equation, the text says "The note transcription method takes as an input the pitch track and outputs discrete notes on a continuous pitch scale, based on Viterbi-decoding of a second, independent hidden Markov model (HMM). [...] The likelihood of a non-silent state emitting a pitch track frame with pitch q is modelled as a Gaussian distribution centered at the note’s pitch p with a standard deviation of semitones, i.e.

enter image description here

where np is a state modelling the MIDI pitch p, z is a normalising constant and the parameter 0 < τ < 1 controls how much the pitch estimate is trusted; we set τ = 0:1. The probability of unvoiced states is set to P(unvoiced|q) = (1 - v)=n, i.e. they sum to their combined likelihood of (1 - v) and v = 0.5 is the prior likelihood of a frame being voiced. The standard deviation varies depending on the state: attack states have a larger standard deviation (σ = 5 semitones) than stable parts (σ= 0.9).

I cannot see how this relates to these equation for Gaussian functions listed on Wikipedia:

enter image description here

enter image description here

Am I misunderstanding or can someone explain the relationship?

Moreover, does the comma in the equation mean, "or"?

1

There are 1 best solutions below

3
On BEST ANSWER

My guess is that they use just $\phi_{p,\sigma}(x)$ to mean:

$$\phi_{p,\sigma}(x)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-p)^2}{2\sigma^2}}$$

or something similar. The comma is there just to separate $p$ and $\sigma$, nothing else.

However, I feel you have not provided enough context, as there is further multiplication with $v$ (?!), division with $z$ (?!) and something denoted as $(\cdot)^{\tau}$ (?!). I am just wary that there might be more to this, hidden in this missing context.

Hope this helps.