Stem and Leaf Plot for Strongly Skewed Data

721 Views Asked by At

How to draw a stem and leaf plot for the following large data, the data is for different EU states so there is a large difference between the data. I have to plot stem and leaf for each table i.e. area, population as well as size of motorway. How do I arrange it in stem and leaf,How many stems would be there (let's say for motorway length) do I have to split the data. Do I have to round or truncate the data?

I have arranged the length of motorway in ascending order;

0, 0, 11, 140, 152, 257, 309, 392, 419, 541, 644, 751, 770, 810, 897, 1295, 1340, 1419, 1482, 1515, 1719, 1763, 1891, 2005, 2127, 2631, 2988, 3686, 6726 11465, 12917, 14701

enter image description here

1

There are 1 best solutions below

3
On BEST ANSWER

I would not guess that a stemplot is the best way to visualize these data, and so I would not recommend one way of setting up the stems as better than another. Particularly so, because the data are strongly skewed to the right, spanning a couple of orders of magnitude.

Nevertheless, stem plots can be made: Here is the default stemplot of these 32 observations from R, followed by the one from Minitab, in which the line beginning (8) has eight observations, one of which is the median 1318.

   R
   The decimal point is 3 digit(s) to the right of the |

    0 | 00012334456888933455789
    2 | 01607
    4 | 
    6 | 7
    8 | 
   10 | 5
   12 | 9
   14 | 7



 Minitab:
 Stem-and-leaf of Motorway  N  = 32
 Leaf Unit = 100

  15  0   000112334567788
 (8)  1   23445778
  9   2   0169
  5   3   6
  4   4
  4   5
  4   6   7
  3   7
  3   8
  3   9
  3   10
  3   11  4
  2   12  9
  1   13
  1   14  7

With the parameter scale=.5 the R function stem returns the abbreviated stemplot below.

   The decimal point is 4 digit(s) to the right of the |

   0 | 0000000001111111111222222334
   0 | 7
   1 | 13
   1 | 5

I do not see how to make a stemplot on any scale without losing some of the detail of the data. If information is to be lost in making a graphical presentation of the data, perhaps a histogram is a better choice. Below is a Minitab histogram of these data.

enter image description here

For some purposes, it might be better to make a histogram of $\log_{10}$ of motorway lengths for the 30 EU states that have motorways.

enter image description here