I am self-studying Castella and Berger's Statistical Inference and am completely stuck on Exercise 7.36, regarding the Pitman Estimator of Scale. I looked up the original paper by Pitman (published in 1939) where this estimator was introduced, but can't read it because the paper is behind a paywall.
Here is the formula for the estimator; $x_i$ are $n$ samples from a distribution with a scale parameter, $f$ is the PDF for this distribution:
$$ d_p^r = {\int_0^\infty t^{n+r-1} \prod_{i=1}^n f(tx_i) dt \over \int_0^\infty t^{n+2r-1} \prod_{i=1}^n f(tx_i) dt} $$
That estimates the $r$th power of the scale parameter. Two important properties; it's scale-equivariant, and among such estimators, it has the smallest scaled mean square error.
In this context, scale equivariance means that $d_p^r$ has this property:
$$ d_p^r(cx_i, ..., cx_n) = c^r d_p^r(x_i,...,x_n), \forall c > 0 $$
First, I am trying to figure out why this estimator works. If $\prod_{i=1}^n f(tx_i)$ is a measure of probability of a certain scale value being correct, then what would have made sense to me would be:
$$ {\int_0^\infty t \prod_{i=1}^n f(tx_i) dt \over \int_0^\infty \prod_{i=1}^n f(tx_i) dt} $$
i.e. average over all possible values of $t$, weighted by the 'measure of probability', and then apply a normalization factor to compensate for the fact that all our 'measure of probability' values may not integrate to 1.
Second, I am struggling to see why this estimator should be scale-equivariant. In the formula, $x_i$ only appears inside $f()$. We don't know anything at all about what $f$ is, aside from it being a PDF with a scale parameter, so it seems very difficult to reason about how the value of the expression will be affected by changing $f(tx_i)$ to $f(ctx_i)$.
Part 1.
I haven't read the original Pitman (1937) paper, but the result can be derived following the procedure described in Lehmann & Casella's Theory of point estimation Chapter 3:
For single scale family, the maximal invariant is $\mathbf{z}=\bigl(\frac{x_{1}}{|x_{1}|},\frac{x_{i}}{x_{1}}\text{ for }i=2,\ldots,n\bigr)$. It follows Theorem 3.3.1 that any scaled equivariant estimator of $\sigma^{r}$ can be written as $\delta(\mathbf{x})=\frac{\delta_{0}(\mathbf{x})}{w(\mathbf{z})}$ where $\delta_{0}(\mathbf{x})$ is any scale equivariant estimator and $w(\mathbf{z})$ is a function of maximal invariant. Using this representation, we can write the risk under scaled squared error loss as $$\begin{align*} R(\sigma^{r},\delta)=R(1,\delta)=\mathbb{E}_{1}\biggl[\biggl(\frac{\delta_{0}(\mathbf{x})}{w(\mathbf{z})}-1\biggr)^{2}\biggr]=\mathbb{E}_{1}\Biggl[\mathbb{E}_{1}\biggl[\biggl(\frac{\delta_{0}(\mathbf{x})}{w(\mathbf{z})}-1\biggr)^{2}\bigg|\mathbf{z}\biggr]\Biggr] \end{align*},$$ where $\mathbb{E}_{1}[\cdot]$ denote the expectation under $\sigma=1$. Minimizing the insider conditional expectation gives the optimal $w^{\ast}(\mathbf{z})=\frac{\mathbb{E}_{1}[\delta_{0}^{2}(\mathbf{x})|\mathbf{z}]}{\mathbb{E}_{1}[\delta_{0}(\mathbf{x})|\mathbf{z}]}$. Taking $\delta_{0}(\mathbf{x})=x_{1}^{r}$, which is scale equivariant. Above derivation yields the optimal estimator would be $x_{1}^{r}\cdot\frac{\mathbb{E}_{1}[x_{1}^{r}|\mathbf{z}]}{\mathbb{E}_{1}[x_{1}^{2r}|\mathbf{z}]}$. Note that the Jacobian $|\partial \mathbf{x}/\partial (x_{1},\mathbf{z})|=|x_{1}^{n-1}|$, hence the conditional density of $x_{1}$ given $\mathbf{z}$ is $$\begin{align*} f_{x_{1}|\mathbf{z}}(x_{1}|\mathbf{z})=\frac{|x_{1}^{n-1}|f(x_{1})\prod_{i=2}^{n}f(z_{i}x_{1})}{\int_{0}^{\infty}u^{n-1}f(u)\prod_{i=2}^{n}f(z_{i}u)du}. \end{align*}$$ Using this and definition of $\mathbf{z}$, let $t=u/x_{1}$, we have $$\begin{align*} \mathbb{E}_{1}[x_{1}^{r}|\mathbf{z}]=\frac{\int_{0}^{\infty}u^{n+r-1}f(u)\prod_{i=2}^{n}f(z_{i}u)du}{\int_{0}^{\infty}u^{n-1}f(u)\prod_{i=2}^{n}f(z_{i}x_{1})du} \end{align*}=x_{1}^{r}\cdot\frac{\int_0^\infty t^{n+r-1}\prod_{i=1}^n f(tx_i) dt}{\int_0^\infty t^{n-1}\prod_{i=1}^n f(tx_i) dt}.$$ Similar results holds for $\mathbb{E}_{1}[x_{1}^{2r}|\mathbf{z}]$, substituting back into $\delta^{\ast}(\mathbf{x})$ gives the Pitman's estimator.
Part 2.
The proof of scale equivariance is easier. For denominator, note that by a simple change of variable $v\mapsto ct$, we have $$\begin{align*} \int_0^\infty t^{n+2r-1}\prod_{i=1}^n f(ctx_i) dt&=\frac{1}{c^{n+2r-1}}\int_0^\infty (ct)^{n+2r-1}\prod_{i=1}^n f(ctx_i) \frac{1}{c}d(ct)\\ &=\frac{1}{c^{n+2r-2}}\int_0^\infty v^{n+2r-1}\prod_{i=1}^n f(vx_i) dv. \end{align*}$$ Similarly, for the numerator, $$\begin{align*} \int_0^\infty t^{n+r-1}\prod_{i=1}^n f(ctx_i) dt&=\frac{1}{c^{n+r-1}}\int_0^\infty (ct)^{n+2r-1}\prod_{i=1}^n f(ctx_i) \frac{1}{c}d(ct)\\ &=\frac{1}{c^{n+r-2}}\int_0^\infty v^{n+r-1}\prod_{i=1}^n f(vx_i) dv. \end{align*}$$ Combining above results, it's clear that $$\frac{\int_0^\infty t^{n+r-1}\prod_{i=1}^n f(ctx_i) dt}{\int_0^\infty t^{n+2r-1}\prod_{i=1}^n f(ctx_i) dt}=c^{r}\cdot \frac{\int_0^\infty t^{n+r-1}\prod_{i=1}^n f(tx_i) dt}{\int_0^\infty t^{n+2r-1}\prod_{i=1}^n f(tx_i) dt}.$$