Is there a kind of line chart for plotting both mean and individual data

176 Views Asked by At

just getting into data analysis now and learned that "dynamite chart" should be avoided.

A better chart would be boxplot + scatter/strip chart, that can show both aggregated and individual data.

That seems like a good solution to bar charts.

Is there a equivalent kind of chart for line charts?

For example, I have data on both male and female runner's velocity for 10 seconds, and I want to graph an aggregated mean velocity over time to compare between male and female, but also want to show individual data to see if there are any outlier on the same chart?

Thank you.

1

There are 1 best solutions below

0
On

There is some confusion here among $[1]$ graphical displays for categorical variables (Female vs Male, Ethinicity, Political preference, etc.), $[2]$ displays for numerical variables (running speed, age, systolic blood pressure, etc.), and $[3]$ displays for both kinds of variables at once. I think you want $[3],$ but I have to say I have no idea what a 'dynamite' chart' is or what you mean by 'aggregated mean velocity' over time. My guess is that you want to compare the running speeds of women and men.

I will illustrate with fake data:

Suppose running speeds for 20 women are as follows:

w = c( 7.6, 12.0, 14.9, 12.0, 16.1, 14.8,  9.2, 17.5,  9.8, 13.0,
      13.8, 14.3, 14.1, 12.8,  9.3, 12.5, 16.6, 15.2, 16.6, 12.5)
m = c(16.9, 12.9, 17.2, 12.0, 15.9, 17.3, 13.2, 16.8, 15.8, 15.6,
      12.3, 15.8,  9.8, 13.2, 20.0, 19.0, 14.6, 18.3, 17.2, 11.5)
speed = c(w, m);  gnd = rep(1:2, each=20) 

Boxplots by Gender. Here is how to make 'parallel' boxplots for comparing speeds by gender, using R statistical software. (In case you don't have it, you can get it free from www.r-project.org. It does more complicated things, but this much is easy. You can view the cs as combining individual observations into a vector; speed has all 40 observations; gnd has 1 for women and 2 for men.) And other statistical software will make similar plots. A 'boxplot' is suitable for a sample of about a dozen or more.

boxplot(speed ~ gnd, horizontal=T, col="wheat", 
    main="Running Speeds of Women (1) and Men(2)")
stripchart(speed ~ gnd, pch=20, add=T)
abline(v = mean(w), col="pink", lwd=2);  abline(v = mean(m), col="skyblue2", lwd=2)

Stripcharts by Gender. Because boxplots do not show how many observations there are, you should always accompany a boxplot with a note about the sample size. Alternatively, if there are not too many observations, you can add a 'stripchart' to the boxplot, showing the location of each observation. [If you have more than several dozen observations, a stripchart can be confusing: it might help to use vertical lines (pch="|") instead of small dots (pch=20), or to use method="stack" to stack symbols for tied observations.]

The heavy line in the center of the box of a boxplot shows the location of the sample median. Because you are interested in comparing sample means, I have added vertical lines at the means: pink for women (13.23), and blue for men (15.27). [Statements in the last line of code.]

enter image description here

I think this kind of chart is easy to understand and shows what you need to show. For my fake data, even though men averaged a little faster than women, there is a considerable overlap in speeds, and only three of 20 men ran faster than the fastest woman. [I confess, I have no idea what you mean by a 'line chart'; according to some definitions it is similar to an ECDF plot.]

Empirical CDFs. An 'empirical cumulative distribution function' (ECDF) starts at $0$ on the left and jumps by $1/n$ at each of the $n$ observations as you move to the right, until it reaches $1$ at the right. Here is how to compare ECDFs for the women and men. For large $n$ an ECDF approximates the CDF of the population. [If you use them in a paper for non-statisticians, you should probably provide a sentence or two of explanation.]

plot(ecdf(w), ylab="Cumulative Fraction", xlab="Running Speed", 
    main="ECDFs of Running Speeds: Women (top) and Men")
lines(ecdf(m))  # 'lines' overlays onto an existing plot
abline(v = mean(w), col="pink", lwd=2);  abline(v = mean(m), col="skyblue2", lwd=2)

enter image description here