Say one wants to fit a curve $f(x)$ to a set of noisy data points $(x_i, y_i)$. If the error for each point $y_i$ is assumed to be normally distributed with variance $\sigma_i^2$, one wants to find the curve that minimizes the sum of weighted error, which is commonly given by: $$\text{Weighted Error}=\sum_i\left(\frac{y_i-f(x)}{\sigma_i}\right)^2$$ I'm looking for the equivalent of this expression for the discrete case, i.e. when instead of a normal distribution, the probability that $f(x_i)$ passes through any given $y$ is defined using a discrete probability distribution of the form: $$P(f(x_i)=y)=\sum_j A_j \delta(y,y_j)$$ My questions for today:
- I assume the corresponding quantity for the weighted error, in analogy with the normal distribution, is given by the function: $$G=-\log P(f(x))$$ Is this assumption correct?
- How would one find the derivative of this quantity in the discrete case? i.e finding the value of: $$\frac{dG} {dx}=\frac{-d \log P(f(x))} {dx}$$
- In general, why the sum of weighted errors usually minimized instead of the expected value, a.k.a the information entropy?
Any help gladly appreciated.