How to create a frequency table with raw frequencies, cumulative raw frequencies, relative frequencies, and cumulative percentage?

1.6k Views Asked by At

Am doing some statistics work and wondering if anyone could tell me how to create a frequency table that has the following to describe the data:

  • Raw frequency
  • Cumulative raw frequency
  • Relative frequency
  • Cumulative percentage

Essentially what I am doing is using the General Social Service data from berkley's SDA website for my data-set.

Here is a link to the data-set I am working with (amount of sexual partners):

Link to the SDA data-set I've set-up

I re-coded the sets/groups to range as follows: Partners: 0

Partners: 1

Partners: 2

Partners: 3

Partners: 4

Partners: 5

Partners: 6

Partners: 7

Partners: 8

Partners: 9

Partners: 10

Partners: 11-100

I was asked to recode the 'partners' variable data-set so that the largest category is 11 or more partners. Wasn't sure how to do this on the SDA website. So I just recoded as 11-100, I think this to be a big enough number.

Thank you in advance

1

There are 1 best solutions below

0
On BEST ANSWER

Here are data for values of a Poisson random variable with mean 3. There are $n = 15$ values. [We used R statistical data to get the data and to do the computations. In practical applications of statistics such software is in widespread use. In case you are interested, we show some of the computer instructions, but you can ignore them if you prefer.]

y = rpois(15, 3)
table(y)
y
1 2 3 4 6 7   values 
4 2 4 3 1 1   raw frequencies

Putting this information into a list of values and a list of raw frequencies.

values = 1:7
raw.f = c(4,2,4,3,0,1,1)

Notice that values from 0 on up are possible. Our smallest value 1 was observed four times. The value 5 was not observed. The total of the raw frequencies must be the sample size.

n = sum(raw.f);  n
## 15

We use the raw frequencies and $n$ to find cumulative frequencies, relative frequencies, and cumulative relative frequencies. First, here is a table of the results, which we discuss later. (Row numbers are given in brackets [ ] at the start of each line; ignore the commas.)

cbind(values, raw.f, cum.f, rel.f, cum.rel.f)
     values raw.f cum.f      rel.f cum.rel.f
[1,]      1     4     4 0.26666667 0.2666667
[2,]      2     2     6 0.13333333 0.4000000
[3,]      3     4    10 0.26666667 0.6666667
[4,]      4     3    13 0.20000000 0.8666667
[5,]      5     0    13 0.00000000 0.8666667
[6,]      6     1    14 0.06666667 0.9333333
[7,]      7     1    15 0.06666667 1.0000000

To get cumulative frequencies we accumulate the number of observations as we go down the column. For example the cumulative frequency on row [3] is 10, that is the sum $4 + 2 + 4 = 10.$ The cumulative frequencies on rows [4] and [5] are the same because there were $0$ occurrences of value 5. The cumulative frequency in the bottom row must always be $n$ because by the last row all of the observations have been accumulated. [Computer code for making the cumulative frequencies is cum.f = cumsum(raw.f).]

Relative frequencies are obtained by dividing each raw frequency by $n=15.$ For example, the relative frequency on row [4] is the raw frequency 3 divided by 15 to give $3/15 = 1/5 = 0.2$ and the relative frequency on row [6] is $1/16 \approx 0.0667.$ [The computer code for making the relative frequencies is rel.f = raw.f/n.]

The cumulative relative frequencies are the cumulative frequencies divided by $n.$ For example, the cumulative relative frequency on row [2] is the cumulative frequency 6 divided by $n = 15$ to give $6/15 = 3/5 = 0.6.$ The last row for cumulative relative frequency must always be $1$ because by the last row we have accumulated $1 = 100\%$ of the data. [The computer code for making the cumulative relative frequencies is cum.rel.f = cumsum(rel.f).]

Please look at all parts of the table carefully--perhaps with a calculator at hand. If there is anything you can't understand after careful consideration, leave a Comment and I (or someone else) will try to explain.

You can use values and raw frequencies to find the mean $\bar X$ of a sample. If the values are denoted by $v_i$ and the raw frequencies as $f_i$ then the mean of the sample is $$\bar X = \frac 1 n \sum_i f_i v_i.$$ For our data the result is $\bar X = 3.$ We sampled our values from a population with mean $\mu = 3,$ and so it is reasonable to expect that the sample mean would be somewhere near 3. It happens that our data gave an exact match between population and sample mean, but that doesn't always happen.

Another formula for $\bar X$ uses values $v_i$ and relative frequencies $r_i = f_i/n.$ The formula is $\bar X = \sum_i r_i v_i.$ Because each frequency has been divided by $n$ to get a relative frequency, we don't need to divide the sum by $n.$

sum(values*raw.f)/n
## 3
sum(values*rel.f)
## 3

Note: The computer instructions for getting cumulative, relative, and cumulative relative frequencies from raw frequencies and for making the table are summarized below.

values = 1:7
raw.f = c(4,2,4,3,0,1,1)
cum.f = cumsum(raw.f)
n = sum(raw.f)
rel.f = raw.f/n
cum.rel.f = cumsum(rel.f)
cbind(values, raw.f, cum.f, rel.f, cum.rel.f)
     values raw.f cum.f      rel.f cum.rel.f
[1,]      1     4     4 0.26666667 0.2666667
[2,]      2     2     6 0.13333333 0.4000000
[3,]      3     4    10 0.26666667 0.6666667
[4,]      4     3    13 0.20000000 0.8666667
[5,]      5     0    13 0.00000000 0.8666667
[6,]      6     1    14 0.06666667 0.9333333
[7,]      7     1    15 0.06666667 1.0000000