Intersection of less than ogive and more than ogive is median proof.

1.6k Views Asked by At

I was always told that the abcissae of the intersection of the less than ogive and the more than ogive gives the median for a grouped classed data (with exclusive data classes). I wanted to prove this.

Suppose there are $m$ class intervals. Let the $i^{th}$ class interval ($i \in [m]$) be given by $ [L_i, U_i]$ and have frequency $f_i$ ($f_i \in \mathbb{N}$). Let class size of $i^{th}$ class interval be given by $h_i = U_i - L_i$. (Since the classes are of exclusive type, $L_{i+1} = U_{i}$). And let $N = \sum_{k=1}^m f_k $ be the total number of data points.

Let $L(x) : [U_1,U_m] \rightarrow \mathbb{R} $ represent the function for the less than ogive. Then: $$ L(x) = \begin{cases} \sum_{k = 1}^i f_k &\text{ if } x = U_i \text{, }i \in [m] \\ \sum_{k = 1}^i f_k + \frac{f_{i+1}}{h_{i+1}}(x - U_i) &\text{ if } U_i < x < U_{i+1} \text{, }i \in [m-1] \end{cases} $$ Let $M(x) : [L_1,L_m] \rightarrow \mathbb{R} $ represent the function for the more than ogive. Then: $$ M(x) = \begin{cases} \sum_{k = i}^m f_k &\text{ if } x = L_i \text{, }i \in [m] \\ \sum_{k = i}^m f_k - \frac{f_{i+1}}{h_{i}}(x - L_i) &\text{ if } L_i < x < L_{i+1} \text{, }i \in [m-1] \end{cases} $$ Extrapolate $L(x)$ and $M(x)$ to $[L_1,U_m]$ by extending them as constant functions outside their defined domain. Then define $g(x) : [L_1,U_m] \rightarrow \mathbb{R}$ as: $$ g(x) = L(x) - M(x) $$ By continuity and montonicity of $g(x)$ on $[U_1,L_m]$, it is easy to see that it has exactly one zero. Now the problem arises. How to find the zero? Unless we know which class interval $[L_i,U_i]$ the zero belongs to, we cannot solve for the point of intersection of the line segments within a class interval. Since $g(x)$ is continuous, I was hoping to use the IVP to find the class interval i.e. show $g(L_k)g(U_k) < 0$ for some $ k \in [m]$.

As a guess, I tried the median class interval. Let $l$ be the median class interval i.e. $\sum_{k=1}^l f_k \leq \frac{N}{2} < \sum_{k=1}^{l+1} f_k$. I was able to show that $g(L_l) < 0$ but for $g(U_l)$: \begin{align} g(U_l) &= L(U_l) - M(L_{l+1}) \\ &= \sum_{k=1}^l f_k - \sum_{k= l+1}^m f_k \\ &= 2\sum_{k=1}^l f_k - N \\ &= 2( \sum_{k=1}^l f_k - \frac{N}{2}) < 0 \end{align} Now what do I do? How to find the zero? Or is there an easier approach to find the zero of $g(x)$?

[The formula for median is $ M = L_l + h_l\frac{\frac{N}{2} - \sum_{k=1}^{l-1}f_k}{f_l}$.]