This is an excerpt from BW Silverman's 'Density Estimation for Statistics and Data Analysis.'
The oldest and most widely used density estimator is the histogram. Given an origin $x_0$ and a bin width of $h$, we define the bins of the histogram to be the intervals $[x_0 + mh, x_0 + (m+1)h]$ for integers $m$. The histogram is defined by
$$ \hat{f}(x) = \dfrac{1}{nh}(\text{no. of $X_i$ in the same bin as $x$})$$
Please help me understand how this became so.
Since I am seeing $\hat{f}$, does this mean that we are talking about an estimator? I also did not understand the formula. I am not sure about the switch from $m$ to $n$, I just assume that they are the same thing.
Any insights would be appreciated.
Yes, $\hat f$ is an estimator. But what it estimates is not a scalar quantity, or even a vector-valued parameter. Rather, it is an estimator of a function. The particular function being estimated is the true underlying probability density from which the sample was presumed to have been drawn. As such, it is what we would characterize as a nonparametric estimator: the distribution need not be a member of any particular parametric family.
The meaning of $\hat f$ is that you choose an "origin" $x_0$, and then partition the entire real line at points $$\{\ldots, x_0 - 2h, x_0 - h, x_0, x_0 + h, x_0 + 2h, \ldots\}.$$ Then you take your sample $(X_1, X_2, \ldots, X_n)$ containing a total of $n$ observations, and for each interval (called "bin") in your partition, you count the number of observations that fall into that interval/bin. Of course, many of these bins will not have any observations. But for those that do, you keep a tally. Then you divide the tallies by the product of the total number of observations and the width of the bin $h$. This gives you the height of the histogram in each bin.
Here is a concrete example. I will choose $x_0 = 0.5$, $h = 2$, and my sample is $$\{3, 7, 2, 1, 0, 5, 10, 3, 2, 4\}.$$ Then $n = 10$. My partition looks like this: $$\{\ldots, -1.5, 0.5, 2.5, 4.5, 6.5, 8.5, 10.5, \ldots \},$$ where I have kept only those endpoints that "cover" my data, because my smallest observation is $0$ and my largest is $10$.
Now I count: In $[-1.5, 0.5)$, there is one observation in my sample that falls in between. In $[0.5, 2.5)$, there are $3$ observations. And so forth. (It helps to sort the observations first.) The result is $(1, 3, 3, 1, 1, 1)$. So my histogram/density estimator is $$\hat f = \begin{cases} 0 & x < -1.5 \\ \frac{1}{20} & -1.5 \le x < 0.5 \\ \frac{3}{20} & 0.5 \le x < 4.5 \\ \frac{1}{20} & 4.5 \le x < 10.5 \\ 0 & 10.5 \le x. \end{cases}$$ This is a step function, and it has the property that it integrates to $1$, thus it is a true density function.
In practice, the more observations $n$ you have in your sample, and the narrower your bin width $h$, the more that the density estimator $\hat f$ will approach the "true" underlying density.