Intuitive explanation of the formula for mode (grouped data).

473 Views Asked by At

The formula for finding the mode in grouped data is given by:

$$ mode = l + \frac {f_1 - f_0}{2 f_1 - f_0 - f_2} X h $$

where,

l = the lower limit of the modal class,

$ f_1 $ = the frequency of the modal class, $ f_0 $ = the frequency of the class preceding the modal class, $ f_2 $ = the frequency of the class succeeding the modal class, h = class width.

There's already a good answer here; an excerpt:

"Now, observe that: $$ \frac{f_1 - f_0}{2f_1 - f_0 - f_2} + \frac{f_1 - f_2}{2f_1 - f_0 - f_2} = \frac{f_1 - f_0}{(f_1 - f_0) + (f_1 - f_2)} + \frac{f_1 - f_2}{(f_1 - f_0) + (f_1 - f_2)} = 1 $$ So if we want to divide an interval of width h into two pieces, where the ratio of sizes of those two pieces is $ (f_1 - f_0) : (f_1 - f_2) $, ), the first piece will have width $ \frac{f_1 - f_0}{2f_1 - f_0 - f_2} h $. This is what the formula for estimating the mode does. It splits the width of the modal bar into two pieces whose ratio of widths is $ (f_1 - f_0) : (f_1 - f_2) $, , and it says the mode is at the line separating those two pieces, that is, at a distance $ \frac{f_1 - f_0}{2f_1 - f_0 - f_2} h $, from the left edge of that bar, $ l $."

The answer does a very good job of explaining what the formula is, but it doesn't touch on:

why we'd expect the mode to be at the line separating the two pieces. Why can't the mode be somewhere else?

I understand that this is approximating, but why do we use this particular approximation?

Further, why do we use the differences between $ f_1 $ and $ f_0 $ & $ f_1 $ and $ f_2 $:

why do we care how much the frequency of the modal class is higher or lower than the frequencies of the classes preceding or succeeding it?

1

There are 1 best solutions below

0
On

This formula works exactly if the distribution is continuous with pdf $p(u)=au^2+bu+c$. Furthermore, that's the simplest pdf which yields an easy mode and can be fitted exactly to the data.

We solve for $a,b,c$ in terms of $f_0, f_1, f_2$ in the equations

$$f_0 =\! \int_{L-h}^{L} p(u)du, \ \ f_1 =\! \int_{L}^{L+h} p(u)du, \ \ f_2 =\! \int_{L+h}^{L+2h} p(u)du$$

This gives $$a= \frac{f_0-2 f_1+f_2}{2 h^3},\ b= \frac{(f_1-f_0)h- (f_0-2 f_1+f_2)L}{h^3}$$

Substituting those values gives the desired formula: $$mode = \frac{-b}{2a} = L + \frac{f_1-f_0}{2f_1-f_0-f_2}h.$$