I have a data set which can be fit well to a single gaussian model, with dependent variables $y_i$ and independent variables $x_i$, with $i=1...N$. I want to avoid using a nonlinear fitting library, and to this end I worked through the math and ended up with an pseudo-analytical solution to the least-squares problem. If we set up
$y=A\exp\left(\frac{-(x-\mu)}{2\sigma^2}\right)$
Then taking $\ln$ of both sides and some straightforward matrix algebra gives the following matrix equation:
$ \left( \begin{array}{c} -\frac{1}{2\sigma^2} \\ \frac{\mu}{\sigma^2} \\ \ln A-\frac{\mu^2}{2\sigma^2} \end{array} \right) = \left( \begin{array}{ccc} \sum_{i} x_i ^4 & \sum_{i} x_i ^3 & \sum_{i} x_i ^2\\ \sum_{i} x_i ^3 & \sum_{i} x_i ^2 & \sum_{i} x_i \\ \sum_{i} x_i ^2 & \sum_{i} x_i & N \end{array} \right)^{-1} \left( \begin{array}{c} \sum_{i} x_i^2\ln y_i \\ \sum_{i} x_i\ln y_i \\ \sum_{i} \ln y_i \end{array} \right)$
And so getting the best fit paramters just involves calculating those matrices and doing a single $3\times 3$ matrix inversion operation.
So far so good. However, when using this in practice, because my data set is a histopgram, it tends to fail because the real data can have $y_i=0$ for some values of $i$. It is sometimes possible to get around this by replacing zeros with very small values in the vicinity of numerical noise, but this fails to be sufficiently general.
Is there any way to rescue this approach? Obviously the iterative least-squares version of this problem can handles zeros gracefully - is there any way to fix this one to do the same?
If you want to stay with a linearized model, in order to avoid the problem you mention (small values of $y$ leading to large residuals), you could weight the residual by $y_i$ as shown here.
To explain better why this works quite well : if you use the nonlinear regression, the residue is $$r_i^{(1)}=y_i^{calc}-y_i^{exp}$$ When you linearize the model by a logarithmic transform, the residue is $$r_i^{(2)}=\log(y_i^{calc})-\log(y_i^{exp})$$ which gives a large contribution to the small values. But, rewriting $$r_i^{(2)}=\log(y_i^{calc})-\log(y_i^{exp})=\log\Big(\frac{y_i^{calc}} {y_i^{exp}} \Big)=\log\Big(1+\frac{y_i^{calc}-y_i^{exp}} {y_i^{exp}} \Big)$$ and assuming that the errors are not too large $$r_i^{(2)}\approx \frac{y_i^{calc}-y_i^{exp}} {y_i^{exp}} $$ So, $$r_i^{(3)}=y_i^{exp}\,r_i^{(2)}=y_i^{exp}\,\Big(\log(y_i^{calc})-\log(y_i^{exp})\Big)\approx y_i^{calc}-y_i^{exp}=r_i^{(1)}$$