comparing areas under a histogram and its PDF

482 Views Asked by At

Assumption: we have a histogram, which follows the shape of a normal distribution. The y-axis of this histogram is the relative frequency. The bin size is 5. My though process(correct me in which step am i wrong):

  1. The area of the histogram then is 5.
  2. We approximate this histogram into a continuous distribution.
  3. This continuous distribution is a bell shaped normal distribution which is almost identical to the shape of the histogram.
  4. This continuous dirstribution is also called the PDF.
  5. THe area under the continuous distribution is almost equal to the area under the histogram.
  6. But we know the area under the pdf = 1; however the area under the histogram is 5

Im getting real confused trying to convert the histogram into a pdf due to this. someone please point out what assumption that i am making is wrong.

1

There are 1 best solutions below

0
On

Comments:

I think you are confused about the heights and areas of the histogram bars. Most software packages use a 'density' scale instead of a 'relative frequency' scale so that the total area of all histogram bars will be unity.

I don't know the exact terminology of your book and I don't want to risk adding to the confusion with a direct answer.

Here is a 'density' histogram from R statistical software of a dataset of size n = 1000, generated from NORM(mean=50, sd=5). Bin widths are 5. What is the area of each bar? How are densities (heights of bars) computed?

How would the vertical axis be labeled if this were a 'relative frequency histogram'?

Each bar is labeled with its density (slightly rounded). Some information about the histogram (from a non-plotting version) is also provided.

I hope this is enough hints and information so you can answer your own question.

 x = rnorm(1000, 50, 5)
 cutpt = seq(20,80,by=5)
 hist(x, prob=T, lab=T, br=cutpt, ylim=c(0, .1), col="skyblue")
 curve(dnorm(x, 50, 5), col="darkgreen", lwd=2, add=T)

 hist(x, prob=T, br=cutpt, plot=F)
 $breaks
  [1] 20 25 30 35 40 45 50 55 60 65 70 75 80

 $counts
  [1]   0   0   1  21 134 336 327 154  25   2   0   0

 $density
  [1] 0.0000 0.0000 0.0002 0.0042 0.0268 0.0672 0.0654 0.0308 0.0050
 [10] 0.0004 0.0000 0.0000

enter image description here