I'm studying a paper on modeling DNA histograms. It presents two alternative formulas for modeling Gaussian components:
Standard form:
$G(x) = \frac{A}{\sqrt{2\pi}\sigma}e^{\frac{-(x-\mu)^2}{2\sigma^2}}\Delta x $
Exact form:
$G(x) = \frac{A}{2\sigma}(erf(\frac{x + 0.5 - \mu}{\sqrt{2\sigma}}) - erf(\frac{x - 0.5 - \mu}{\sqrt{2\sigma}}))$
The author recommends the exact form over the standard form, especially when $\sigma$ is low. However, when I plot both in gnuplot, they aren't the same: the exact form produces higher values for the same parameters (I've set A = 1.0, sd = 1.0, mu = 0.0, the purple line is the standard form, the blue line is the exact form):
No explanation is provided for why one form is better than the other.
Questions
- Why don't the functions match? Am I misinterpreting something?
- Why would the Exact Form, using the error function, be better than the standard form - I guess maybe it is numerically faster or more accurate?
You appear to be considering something like:
$$P\left ( x-\frac{\Delta x}{2} \leq X \leq x+\frac{\Delta x}{2} \right )$$
where $X$ is Gaussian with mean $\mu$ and standard deviation $\sigma$. This is exactly equal to
$$\int_{x-\Delta x/2}^{x+\Delta x/2} \frac{1}{\sqrt{2 \pi} \sigma} e^{-(y-\mu)^2/(2\sigma^2)} dy$$
which can be straightforwardly rewritten in terms of the error function, if desired. When $\Delta x$ is much less than $\sigma$, we can reasonably approximate this integral using the midpoint rule with $1$ point: we get
$$\frac{1}{\sqrt{2 \pi} \sigma} e^{-(x-\mu)^2/(2 \sigma^2)} \Delta x.$$