Assumption: we have a histogram, which follows the shape of a normal distribution. The y-axis of this histogram is the relative frequency. The bin size is 5. My though process(correct me in which step am i wrong):
- The area of the histogram then is 5.
- We approximate this histogram into a continuous distribution.
- This continuous distribution is a bell shaped normal distribution which is almost identical to the shape of the histogram.
- This continuous dirstribution is also called the PDF.
- THe area under the continuous distribution is almost equal to the area under the histogram.
- But we know the area under the pdf = 1; however the area under the histogram is 5
Im getting real confused trying to convert the histogram into a pdf due to this. someone please point out what assumption that i am making is wrong.
Comments:
I think you are confused about the heights and areas of the histogram bars. Most software packages use a 'density' scale instead of a 'relative frequency' scale so that the total area of all histogram bars will be unity.
I don't know the exact terminology of your book and I don't want to risk adding to the confusion with a direct answer.
Here is a 'density' histogram from R statistical software of a dataset of size n = 1000, generated from NORM(mean=50, sd=5). Bin widths are 5. What is the area of each bar? How are densities (heights of bars) computed?
How would the vertical axis be labeled if this were a 'relative frequency histogram'?
Each bar is labeled with its density (slightly rounded). Some information about the histogram (from a non-plotting version) is also provided.
I hope this is enough hints and information so you can answer your own question.