Average of percentage

77 Views Asked by At

I know I can't average the % change but I'm having trouble coming up with weights for these numbers Here's Image Link .

Each combined row represents a group and the bottom rows are adding up R and P basically. I'm wondering how you can end up with 35.2 if you have 33, 21 and 21.7. I can't come up with weights. Is there a solution to this? Also the percentage is calculated using the following logic:

(8.36/6.29)-1=33.03 (13.04/10.71)-1=21.72 and so on so we get 33.03 21.72 and 21.78

1

There are 1 best solutions below

3
On BEST ANSWER

Essentially, we have, for $i = 1, 2, 3$:

  • Two input values $P_{1,i}$ and $P_{2,i}$
  • Two input values $R_{1,i}$ and $R_{2,i}$
  • Two derived values $Q_{1,i} = \frac{R_{1,i}}{P_{1,i}}$ and $Q_{2,i} = \frac{R_{2,i}}{P_{2,i}}$
  • One proportional increase $F_i = \frac{Q_{2,i}-Q_{1,i}}{Q_{1,i}}$

At the bottom, we also have

  • $\sum P_{i,1}$ and $\sum P_{i,2}$
  • $\sum R_{i,1}$ and $\sum R_{i,2}$
  • Two derived values $Q_1 = \frac{\sum R_{1,i}}{\sum P_{1,i}}$ and $Q_2 = \frac{\sum R_{2,i}}{\sum P_{2,i}}$
  • One proportional increase $F = \frac{Q_2-Q_1}{Q_1}$

There is no particular reason to think that $F$ is any kind of weighted mean of the $F_i$. This is, I think, basically a variation of Simpson's Paradox.


Here's a simpler example with only two rows that illustrates the issue:

$$ \begin{array}{|c|c|c|c|} \hline P & R & R/P & \text{percent difference} \\ \hline 5 & 20 & 4 & \\ 6 & 30 & 5 & \frac{5-4}{4} = 25\% \\ \hline 1 & 10 & 10 & \\ 10 & 130 & 13 & \frac{13-10}{10} = 30\% \\ \hline 5+1 = 6 & 20+10 = 30 & 5 & \\ 6+10 = 16 & 30+130 = 160 & 10 & \frac{10-5}{5} = 100\% \\ \hline \end{array} $$

The seeming "paradox" in this example arises from a few properties of the data:

  • The first set of data has smaller ratios and its scale barely increases, staying at medium size.
  • The second set of data has larger ratios and its scale increases dramatically from very small to very large.
  • When we sum the data, the ratio of the first row is dominated by the first set of data, whose ratio is smaller; the ratio of the second row is dominated by the second set of data, whose ratio is larger. Therefore, the increase in ratio is higher than the corresponding increase in either set of data in isolation.

In your data, the first set of data has smaller ratios and a modest increase in scale, whereas the other two sets of data have larger ratios and more dramatic increases in scale. It's less stark than in the example I constructed, but that's more or less what's happening.