how to elaborate an histogram with several variables

132 Views Asked by At

i have to make an histogram in with the following data:

                     GDP: CONSTANT VALUES (2008=100)                                            

**sector**  **2003**    **2004**    **2005**    **2006**    **2007**
Agriculture   532918    543230        532043      562146    585812
Mining        1236807   1258769     1263937      1250930    1235517
Construction 1505948    1598346      1645017     1785796    1874591
Manufacturing 6836256   7098173     7302589      7731867    7844533
Wholesale      8635763  918174       966467       1037362   1070758

i know the rules and steps to make an histogram of a very simple data (with only one variable expressed in a single year) like this:

age of members of group A in 2013
12 13 13 57 57 90 56 32 12 34 
16 23 23  23 14 67 89 90 35 92

the problem is that i am very confused because the former it´s a time series and it contains several variables and it´s quantity in several years and i do not know how to make one histogram to graph all the data together.

could you please help me?

many thanks in advance.

1

There are 1 best solutions below

0
On

Categorical data. I think you're asking about a bar chart (for categorical data) rather than a histogram (for numerical data). I suggest a bar chart with a separate bar for each year. The total height of each bar should be the total of all sectors for that year. Then the bar can be segmented with stacked rectangles showing the contribution of each sector to the total. If color is available each sector can have a different color.

Here is an example from Minitab statistical software: You might want to express data in thousands to make the vertical scale easier to read.

enter image description here

Another option would be a sequence of five pie charts, each with five colored sectors.

Numerical data. If you have data on productivity of several hundred midsized firms for each of three years you might make a sequence of histograms. Here is some basic code to do that in R statistical software:

prod.2015 = rnorm(500, 100, 15)  # Three lines of code to generate fake data
prod.2016 = rnorm(500, 115, 20)
prod.2017 = rnorm(500, 235, 18)
mx = max(prod.2015, prod.2016, prod.2017)  # min and max to put all
mn = min(prod.2015, prod.2016, prod.2017)  #   three on same scale
par(mfrow=c(3,1))  # 3 panels vertically
  hist(prod.2015, prob=T, col="maroon", xlim=c(mn,mx))
  hist(prod.2016, prob=T, col="green3", xlim=c(mn,mx))
  hist(prod.2017, prob=T, col="blue", xlim=c(mn,mx))
par(mfrow=c(1,1))  # return to single-panel graphs

enter image description here

Sometimes people try to put information for all three years on the same histogram, but I have never been happy with the results.

Another possibility is to put estimated density functions for productivity on the same graph (for a more compact display), using different colors. Here is the code and result.

hdr = "Productivity for 2015 (maroon), 2016 (green), and 2017"
plot(density(prod.2015), col="maroon", lwd=2, xlim=c(mn,mx), 
    xlab="Productivity", main=hdr)
 lines(density(prod.2016), col="green3", lwd=2)
 lines(density(prod.2017), col="blue", lwd=2)

enter image description here

Intuitively, you can view density estimators as 'smoothed histograms'.

Note: Minitab is easier to use and makes some nice graphs, but flexibility as to style and colors is somewhat limited. R requires a bit of programming, but the programming makes it easier to tailor graphs to specific needs. There is a vast variety of software for statistical graphics, also varying vastly in ease of use and appropriateness of results.