Which parameter should be considered as "scale" parameter for Gamma distribution?

62 Views Asked by At

I originally posted this question on crossvalidated. In case it would be considered too "nerdy" or useless there, I also posted it here with the hope to get more replies.


From Wikipedia and probably all statistics textbooks, we know that in the density of a Gamma random variable $$f(x; k, \theta) = \frac{1}{\Gamma(k)\theta^k}x^{k - 1}e^{-\frac{x}{\theta}}, \quad x > 0; \theta > 0, k > 0, \tag{1}$$ $k$ is called shape parameter and $\theta$ is called scale parameter. The reason that $\theta$ is referred as scale parameter is quite obvious: if $X \sim \text{Gamma}(k, \theta)$, then for any $c > 0$, $cX \sim \text{Gamma}(k, c\theta)$. In other words, for fixed $k$, the Gamma$(k, \theta)$ family is invariant under scale transformation $X \mapsto cX$. This argument makes much sense and I followed this convention for many years.

However, when I was answering this question tonight and tried using Gamma distribution as an example, I found some conflict if we considered Gamma distribution as a member of exponential-family distributions. Conventionally, an exponential family has the following representation (for example, see Chapter $7$ of this book): $$f(x; \phi, \varphi) = \exp\left[\frac{x\phi - \gamma(\phi)}{\varphi} + \tau(x, \varphi)\right].\tag{2}$$ In this representation, $\varphi$ is called scale parameter. Now let's transform $(1)$ into the form $(2)$ and see what will happen.

By letting $\varphi = \frac{1}{k}$ and $\phi = -\frac{1}{k\theta}$, $(1)$ can be written as $$f(x; \phi, \varphi) = \exp\left[\frac{x\phi - (-\log(-\phi))}{\varphi} -\frac{\log \varphi}{\varphi} + \left(\frac{1}{\varphi} - 1\right)\log x - \log\Gamma(\varphi^{-1})\right] \tag{3}$$

Comparing $(2)$ and $(3)$, it is that $\varphi = \frac{1}{k}$ should be called as scale parameter! So based on the nomenclature inside exponential family, it seems also makes sense to refer $k$ as scale parameter.

I understand that this small collision may be due to reparameterization and the fact the word "scale" is too busy in statistics and in probability. Can anyone give other explanations and is it possible that this unfortunate collision can be fixed? Thank you very much.