Background The paper referenced below at one point makes an argument that goes something like this:
Let $X \sim \text{Gamma}(\alpha, \beta)$, where $\text{Gamma}(\alpha, \beta)$ refers to the Gamma distribution with shape $\alpha$ and rate $\beta$. The log pdf is given by
\begin{align*}
\log p(x \, | \, \alpha, \beta) = \alpha \log \beta - \ln \Gamma(\alpha) - (\alpha -1 ) x - \beta x
\end{align*}
Suppose we want to find a (semi-)conjugate prior for the shape parameter $\alpha$. Considering the log pdf as a function of $\alpha$, the prior must have sufficient statistics that are linear combinations of $(\alpha, \Gamma(\alpha)).$ But there is
no exponential family distribution with density on $\alpha$ whose sufficient statistics include $\Gamma(\alpha)$ as a basis function.
Question Why can't $\Gamma(\alpha)$ be one of the sufficient statistics for an exponential family distribution?
Reference Winn, J., Bishop, C. M., & Jaakkola, T. (2005). Variational message passing. Journal of Machine Learning Research, 6(4).
Gamma distribution, member of the exponential family can be written as
$$f_X(x,\alpha,\beta)=\frac{\beta^\alpha}{\Gamma(\alpha)}\text{exp}\left\{(\alpha-1)\log x-x\beta \right\}$$
thus
$$T=\left(\sum_i \log x_i; \sum_i x_i\right)$$
is jointly sufficient for $\theta=(\alpha;\beta)$
...let's suppose $\beta$ is known.
The gamma model can be factorize as follows
$$p(\mathbf{x},\alpha)= e^{-\beta\Sigma x}\times\frac{\beta^{n \alpha}}{\left[\Gamma(\alpha)\right]^n}\prod_x x^{\alpha-1}=\psi(\mathbf{x})\times g(t(\mathbf{x}),n,\alpha)$$
thus the conjugate prior will have the form as
$$\pi(\alpha)\propto g(s,m,\alpha)$$
that is
$$\pi(\alpha)\propto \frac{c^{m\alpha}}{\left[\Gamma(\alpha)\right]^m}s^{\alpha-1}$$
...if $\beta$ is unknown too, use the same procedure to find the requested prior
FYK, here find an useful link with a table of the major statistical models with all the detailed priors, posterior's parameters and so on