Suppose I have a histogram, $N$, each with bins of width $\Delta x$, denoted by bin indices, $i$. The count of a single bin is then $N_{i}$.
I wish to estimate the empirical density for a certain bin. Defining $N_{Total} = \sum^{dim(N)}_{i=0} N_{i}$ and $\hat{N}_{i}$ as the empirical density of a single bin, I can use:
$\hat{N_{i}} = \frac{N_{i}}{N_{total}\Delta x}$
This is a standard method for measuring empirical density. For instance, see here.
Suppose I then wish to calculate the error. By chain rule and the assumption of the Poisson distribution for points in the bin, such that $dN_{i} = \sqrt{N_{i}}$ and $dN_{Total} = \sqrt{N_{Total}}$.
My solution for the error, using chain rule and solving, is:
$d \hat{N_{i}} = \hat{N_{i}}\sqrt{\frac{1}{N_{i}} + \frac{1}{N_{Total}}}$.
This solution, however, did not include the error on $\Delta x$, which I am fairly sure has an influence on the value $\hat{N_{i}}$ and thus $d\hat{N_{i}}$.
Is my solution correct as is?
Should the error on $\Delta x$ be included in the calculation as well, if this is even applicable?
If so, should the error on the bin width be $d \Delta x = \frac{\Delta x}{2}$?
Some comments on density estimation: As you pursue your efforts to approximate a population density by histograms, here is some background information you may find helpful.
Density histograms. In R, you can use the parameter
prob=Twith thehistprocedure to get a histogram in which the total area of all bars in the histogram is $1.$ That makes it feasible to plot on the same axes the density curve of the continuous distribution from which the data were randomly sampled. For reasonably large samples there is usually a good match between the histogram and the density function.Consider a random sample
xof size $n = 500$ from the distribution $\mathsf{Gamma}(\mathrm{shape}=\alpha=6, \mathrm{rate} = \lambda = 0.1),$ which has mean $\mu = \alpha/\lambda = 60.$The width of each bar is 20 and I have put labels to show the heights (densities) of the bars. [Also. see the Note at the end.]
Kernel density estimation (KDE). A kernel density estimator seeks to approximate the population density function by using a mixture of 'kernels' (shapes, which can be chosen to be rectangles, normal density functions, etc.) You may want to read the Wikipedia article on 'kernel density estimation' and perhaps some of its references--especially, those by B. Silverman.)
Here is the same histogram as above. The default KDE from R is shown as a dotted curve. Tick marks along the horizontal axis show positions of the 500 observations.
The KDE is computed without regard to the histogram; if I had chosen different cutpoints for the histogram bars, the KDE would be the same. [Below, parameter
br=25was used to 'suggest' using more bars--perhaps too many.]Empirical CDFs (ECDFs). The empirical CDF of a sample is made by sorting the sample of size $n$ from smallest to largest, starting at $0$ on the left, the ECDF increases by $1/n$ at each observation, reaching $1$ at the right.
Often an ECDFs gives a more accurate view of the CDF of the population than a histogram gives of the density function (because ECDFs do not rely on arbitrary binning). Below the ECDF of
xis compared with the CDF of $\mathsf{Gamma}(6,.1).$Note: In R, a non-plotted histogram is a list of details of the numbers used to make a histgram, some of which are copied below: