Modeling Gaussian components with standard vs exact functions

44 Views Asked by At

I'm studying a paper on modeling DNA histograms. It presents two alternative formulas for modeling Gaussian components:

Standard form:

$G(x) = \frac{A}{\sqrt{2\pi}\sigma}e^{\frac{-(x-\mu)^2}{2\sigma^2}}\Delta x $

Exact form:

$G(x) = \frac{A}{2\sigma}(erf(\frac{x + 0.5 - \mu}{\sqrt{2\sigma}}) - erf(\frac{x - 0.5 - \mu}{\sqrt{2\sigma}}))$

The author recommends the exact form over the standard form, especially when $\sigma$ is low. However, when I plot both in gnuplot, they aren't the same: the exact form produces higher values for the same parameters (I've set A = 1.0, sd = 1.0, mu = 0.0, the purple line is the standard form, the blue line is the exact form): Normal Functions

No explanation is provided for why one form is better than the other.

Questions

  1. Why don't the functions match? Am I misinterpreting something?
  2. Why would the Exact Form, using the error function, be better than the standard form - I guess maybe it is numerically faster or more accurate?
1

There are 1 best solutions below

5
On BEST ANSWER

You appear to be considering something like:

$$P\left ( x-\frac{\Delta x}{2} \leq X \leq x+\frac{\Delta x}{2} \right )$$

where $X$ is Gaussian with mean $\mu$ and standard deviation $\sigma$. This is exactly equal to

$$\int_{x-\Delta x/2}^{x+\Delta x/2} \frac{1}{\sqrt{2 \pi} \sigma} e^{-(y-\mu)^2/(2\sigma^2)} dy$$

which can be straightforwardly rewritten in terms of the error function, if desired. When $\Delta x$ is much less than $\sigma$, we can reasonably approximate this integral using the midpoint rule with $1$ point: we get

$$\frac{1}{\sqrt{2 \pi} \sigma} e^{-(x-\mu)^2/(2 \sigma^2)} \Delta x.$$