Fitting distribution to histogram (the only data)

38 Views Asked by At

I have a histogram in which I only have the frequency data of each interval group (0-100,101-200,...), but I don't have the specific data for each observation. How do I fit a lognormal distribution to the dataset and estimate the parameters? Also, is there a name to this kind of estimation problem so I can read up more on it? Thank you!

1

There are 1 best solutions below

0
On

If we have $$X_1,...,X_n\sim \text{Lognormal}(\mu,\sigma^2)$$ then $$\ln(X_1),...,\ln(X_n)\sim\mathcal{N}(\mu,\sigma^2)$$ which means $$\hat{u}=\frac{1}{n}\sum_{j=1}^n\ln(X_j)$$ $$\hat{\sigma}^2=\frac{1}{n-1}\sum_{j=1}^n\left(\ln(X_j)-\hat{u}\right)^2$$ are appropriate estimators for your parameters $\mu,\sigma^2$. If all you have is grouped data, and $f_j$ is the observed count of the class $\left[100(j-1)+1,100j\right]$ for $j=1,2,3,...,m$, I suggest you use $$\hat{u}=\frac{1}{\sum_{j=1}^m f_j}\times \sum_{j=1}^m \ln(100j-50)f_j$$ $$\hat{\sigma}^2=\frac{1}{\sum_{j=1}^mf_{j}-1}\sum_{j=1}^m\left(\ln(100j-50)-\hat{u}\right)^2f_j$$ as estimates for your parameters. Notice we're choosing the midpoint $100j-50$ as a suitable representative of all data in the class $[100(j-1)+1,100j].$