Can sufficient statistics include indicators?

49 Views Asked by At

For example, Beta distribution is defined on $[0,1]$, and if we know that all the observations are in this range, then valid sufficient statistics would be $\prod x_i$ and $\prod (1-x_i)$. However, if we don't know if all the observations lie between $0$ and $1$, shouldn't we add the following indicators to the sufficient statistics:

$$I(x_{(1)} < 0)$$ $$I(x_{(n)} > 1)$$

or just $x_{(1)}$, $x_{(n)}$?

1

There are 1 best solutions below

0
On BEST ANSWER

In the Factorization theorem, the joint density is expressed in the form $$f(\boldsymbol x) = h(\boldsymbol x) g(\boldsymbol T(\boldsymbol x) \mid \boldsymbol \theta),$$ where $h$ is a function of the sample $\boldsymbol x = (x_1, \ldots, x_n)$ but not of the parameters $\boldsymbol \theta = (\theta_1, \ldots, \theta_m)$, and $g$ is a function of both the sample and parameters, but only through some (possibly multivariate) function $\boldsymbol T$ of the sample; then this function $\boldsymbol T$ is our sufficient statistic. The idea here is that we factor out the part of the density that does not depend on the parameters because that part is not informative of the value of the parameters, and the part that is informative is expressible in terms of some function of the sample that hopefully achieves some data reduction.

In light of this, you can immediately see that those indicators you mentioned do not depend on the parameters, so they are part of $h$ and not $\boldsymbol T$. Including them as part of the sufficient statistic is unnecessary because there is no information about the parameters that is lost by omitting them.