Find a minimal sufficient statistic for $U(\theta,\theta+c)$, where $(\theta,c)$ unknown.

440 Views Asked by At

Suppose $X_1,\cdots,X_n$ are $i.i.d$ from a distribution with p.d.f $$\delta_{(\theta,c)}(x)=\frac{1}{c}\mathbb{1}_{(x\in[\theta,\theta+c])},$$ where $\theta\in\mathbb{R}$ and $c\in\mathbb{R}^+$ unknown.

Find a minimal sufficient statistic for $(\theta,c)$.

From the range of $x_i$, i.e. $x_i\in[\theta,\theta+c]$, we can determine that $\theta\leq x_i$ and $\theta+c\geq x_i$, $\forall i\in\{1,\cdots,n\}$. This implies $$\theta\leq x_{(1)}\text{ and }\theta+c\geq x_{(n)},$$ where $x_{(1)}=\underset{i\in\{1,\cdots,n\}}{\min}x_i$ and $x_{(n)}=\underset{i\in\{1,\cdots,n\}}{\max}x_i$.

The plots shows the area of $\theta\leq x_{(1)}\text{ and }\theta+c\geq x_{(n)}.$ It look like that $(x_{(1)},x_{(n)})$ is a minimal sufficient statistic for $(\theta,c)$ because $(x_{(1)},x_{(n)})$ uniquely determines the shape of the log-likelihood function. Is this correct?

illustration

2

There are 2 best solutions below

4
On BEST ANSWER

(Nobody answers so I post this answer which I am not sure is correct)

We show $(Y_{(1)},Y_{(n)})$ is a sufficient complete statistic, which implies $(Y_{(1)},Y_{(n)})$ is minimal sufficient. We first give the p.d.f of $(Y_{(1)},Y_{(n)})$ $$f(y_1, y_n,\theta,c)=\frac{n(n-1)}{c^n}(y_n-y_1)^{n-2},\quad\forall \theta\leq y_1\leq y_n\leq \theta+c\text{ and } (\theta,c)\in\mathbb{R}\times\mathbb{R}^+$$

Then, for any function $g(x_1,x_n)$ so that $\mathbb{E}_{\theta,c}\left[g(y_1,y_n)\right]=0,\forall (\theta,c)\in\mathbb{R}\times\mathbb{R}^+$, we have $$0=\frac{n(n-1)}{c^n}\int_{\theta}^{\theta+c}\int_{\theta}^{y_n}g(y_1,y_n)(y_n-y_1)^{n-2} dy_1dy_n,\forall (\theta,c)\in\mathbb{R}\times\mathbb{R}^+$$ The integral area of $(y_1, y_n)$ is a triangle with vertices $(\theta,\theta)$, $(\theta,\theta+c)$ and $(\theta+c,\theta+c)$. With varying $\theta\in\mathbb{R}$ $c\in\mathbb{R}^+$, these triangles generate the Beral $\sigma$-algebra of $\mathcal{B}=\{(x,z)\in\mathbb{R}^2:x\leq z\}$. Thus $$0=\frac{n(n-1)}{c^n}\int_{A}g(y_1,y_n)(y_n-y_1)^{n-2} d(y_1,y_n),\text{ for any Borel set }A\subset\mathcal{B}.$$ This means $$g(y_1,y_n)(y_n-y_1)^{n-2}\equiv0,a.e.\iff g\equiv 0,a.e.$$ Thus we conclude that $(Y_{(1)},Y_{(n)})$ is a complete statistic for $(\theta,c).$

Since $(Y_{(1)},Y_{(n)})$ is sufficient by factorization theorem, we conclude that $(Y_{(1)},Y_{(n)})$ is minimal sufficient.

1
On

This is an attempt at an alternative proof that $(x_{(1)}, x_{(n)})$ is a minimal sufficient statistic for $(\theta, c)$ using a direct approach.

Consider the likelihood ratio $$ R(x,y) = \frac{p(x | \theta, c)}{p(y | \theta, c)} $$ where $x = (x_{1}, x_{2}, \dots x_{n}), y = (y_{1}, y_{2}, \dots y_{n})$. The ''factorization'' theorem of minimal sufficient statistics states that if a statistic $T(x)$ has the following property P then it is a minimally sufficient statistic for $\theta$:

P: $R(x, y)$ does not depend on $\theta$ if and only if $T(x) = T(y)$.

I claim that $T(x) = (x_{(1)}, x_{(n)})$ has this property and therefore is minimally sufficient.

To see this, first consider what $p(x | \theta, c)$ is. I claim that $$p(x | \theta, c) = c^{-n} I[x_{(1)} \geq \theta, x_{(n)} \leq \theta+c].$$ The indicator $I[x_{(1)} \geq \theta, x_{(n)} \leq \theta+c]$ just determines if $x$ is a valid value in the range; certainly $p(x | \theta, c) = 0$ if $x$ is not within $[\theta, \theta+c]$. Otherwise, since the distribution is uniform we have that $p(x | \theta, c)$ should just be $c^{-n}$ because the density function of each $x_{i}$ is just $c^{-1}$.

Therefore, we have that $$ R(x, y) = \frac{p(x | \theta, c)}{p(y | \theta, c)} = \frac{I[x_{(1)} \geq \theta, x_{(n)} \leq \theta+c]}{I[y_{(1)} \geq \theta, y_{(n)} \leq \theta+c]} $$ since the $c^{-n}$ values cancel out.

At this point, I should admit that the factorization theorem is not technically be applicable here, but I'd guess that it is possible to extend that theorem to this situation. Intuitively, we can see that the ratio of $p(x | \theta, c)$ to $p(y | \theta, c)$ is independent of $(\theta, c)$ if and only if $T(x) = T(y)$, despite the technicality that $R(x, y) = \frac{0}{0}$ when both $x$ and $y$ are not in the interval. That is, if we pretend that $\frac{0}{0} = 1$, then $R(x, y) = 1$ whenever $T(x) = T(y)$, whereas it might either be 0 or $+\infty$ when $T(x) \not = T(y)$.