mean of geometric means?

389 Views Asked by At

Biological samples are taken weekly, and their values are averaged each month using the geometric mean. Monthly geometric means for several years of data are thus available.

I want to compute the "average" value for each month of the year over a 5-year period. How should I average the 5 geometric means for, say, the month of July - with an arithmetic mean or with a geometric mean?

Ultimately, I want to plot a curve that best captures the average or typical change in sample values over the course of the year, based on 5 years of monthly geometric means. Thanks for your help.

2

There are 2 best solutions below

2
On

You average geometric means with a geometric average - in this case the July mean is the fifth root of the product of the five annual July geometric means.

Edit in response to @lulu 's comment on the original question. The samples are analyzed weekly and you want to calculate monthly averages. July sometimes has four weekly analyses and sometimes five. If you have the annual monthly averages but not the weekly data you should compute a weighted geometric mean of the monthly means, with weights determined by looking at the calendar to identify the years when July had five weeks. @heropup 's answer provides the details.

Even doing that won't deal with the fact that week and month boundaries aren't aligned, which will skew monthly averages.

I hope all you need is the approximate seasonal trend, not more precision than the data can provide.

1
On

The geometric mean of $n$ positive numbers $(x_1, x_2, \ldots, x_n)$ is given by $$\tilde x = (x_1 x_2 \cdot \ldots \cdot x_n)^{1/n} = \left(\prod_{i=1}^n x_i \right)^{1/n}.$$ That is to say, we multiply all the observations in the sample together, and then take the $n^{\rm th}$ root.

Another way to think of this calculation is that if take the logarithm of the sample, i.e. $$y_i = \log x_i, \quad i = 1, 2, \ldots, n,$$ then $$\log \tilde x = \log (x_1 x_2 \cdot \ldots \cdot x_n)^{1/n} = \frac{1}{n} \sum_{i=1}^n \log x_i = \frac{1}{n} \sum_{i=1}^n y_i = \bar y.$$ That is to say, the arithmetic mean of the log-transformed values equals the logarithm of the geometric mean.

With this in mind, if we took several samples, not necessarily of equal sample size, then to obtain a grand mean of means, it is clear that we must compute a weighted average, with the weight proportional to the sample size. Explicitly, if we have means $$\bar y_1, \bar y_2, \ldots, \bar y_m$$ with sample sizes $n_1, n_2, \ldots, n_m$, then $$\bar Y = \frac{\sum_{i=1}^m n_i \bar y_i}{\sum_{i=1}^m n_i}.$$ Therefore, the same applies to the geometric mean: $$\tilde X = \left(\prod_{i=1}^m \tilde x_i^{n_i}\right)^{1/\sum n_i}.$$ In other words, you would take each geometric mean $\tilde x_i$, raise each to the power of the respective sample size $n_i$, multiply everything together, then take the root of the sum of all of the sample sizes. This of course might not be numerically stable, which is why it may be better to use the log-transformed values, then exponentiate: $$\tilde X = \exp \left( \frac{\sum_{i=1}^m n_i \log \tilde x_i}{\sum_{i=1}^m n_i} \right).$$ Let's do an example. Suppose we have $$\begin{array}{lll} (2, 1, 3, 3, 7), & \tilde x_1 = 2.63072, & n_1 = 5 \\ (3, 5, 11), & \tilde x_2 = 5.48481, & n_2 = 3 \\ (9, 2, 41, 6), & \tilde x_3 = 8.1574, & n_3 = 4 \end{array}$$ Then you have $m = 3$ samples, and your grand geometric mean should just be the geometric mean of every observation, i.e. $$\tilde X = (92058120)^{1/12} = 4.60969.$$ But if you were to take the geometric or arithmetic mean of the $\tilde x_i$, you'd get $4.90075$ and $5.42431$, neither of which is the same. Instead, you need to compute $$\tilde X = \left((2.63072)^5 (5.48481)^3 (8.1574)^4\right)^{1/12} = 4.60969,$$ or $$n_1 \log \tilde x_1 = 4.83628, \quad n_2 \log \tilde x_2 = 5.10595, \quad n_3 \log \tilde x_3 = 8.3957,$$ then $$\tilde X = \exp\left(\frac{4.83628+5.10595+8.3957}{5+3+4}\right) = e^{1.52816} = 4.60969.$$ Note that these calculations simplify if the sample sizes are equal in each case.