Before the following equation, the text says "The note transcription method takes as an input the pitch track and outputs discrete notes on a continuous pitch scale, based on Viterbi-decoding of a second, independent hidden Markov model (HMM). [...] The likelihood of a non-silent state emitting a pitch track frame with pitch q is modelled as a Gaussian distribution centered at the note’s pitch p with a standard deviation of semitones, i.e.
where np is a state modelling the MIDI pitch p, z is a normalising constant and the parameter 0 < τ < 1 controls how much the pitch estimate is trusted; we set τ = 0:1. The probability of unvoiced states is set to P(unvoiced|q) = (1 - v)=n, i.e. they sum to their combined likelihood of (1 - v) and v = 0.5 is the prior likelihood of a frame being voiced. The standard deviation varies depending on the state: attack states have a larger standard deviation (σ = 5 semitones) than stable parts (σ= 0.9).
I cannot see how this relates to these equation for Gaussian functions listed on Wikipedia:
Am I misunderstanding or can someone explain the relationship?
Moreover, does the comma in the equation mean, "or"?
My guess is that they use just $\phi_{p,\sigma}(x)$ to mean:
$$\phi_{p,\sigma}(x)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-p)^2}{2\sigma^2}}$$
or something similar. The comma is there just to separate $p$ and $\sigma$, nothing else.
However, I feel you have not provided enough context, as there is further multiplication with $v$ (?!), division with $z$ (?!) and something denoted as $(\cdot)^{\tau}$ (?!). I am just wary that there might be more to this, hidden in this missing context.
Hope this helps.