How to interpret minimal sufficient statistics?

368 Views Asked by At

I know the definition of minimal sufficient statistics is:

A sufficient statistic T(X) is called a minimal sufficient statistic if, for any other sufficient statistic T'(X), T(X) is a function of T'(X).

I want to know how to interpret this definition from function view. T is smaller than T', if T is a function of T'. What does that mean?

2

There are 2 best solutions below

0
On BEST ANSWER

The notion of sufficiency is data compression.

A sufficient stat is a stat which in some sense contains all useful information about the sample $X$. (for estimating the true parameter, of course)

The definition of a sufficient stat is the following: the conditional distribution of X(sample) given T(X) does not depend on $\theta$, which is saying in more human language $T(X)$ is a summary of $X$. Immediately, we see each sufficient stat defines an equivalence relation on the sample space. Again, we're compressing data so a set of samples which yield repetitive information about the true parameter is compressed to a single equivalence class. For example, suppose we do $n$ iid Bernoulli $\theta$ trials, any set of samples such that each sample in the set has the same sum may be regarded as one sample for the purpose of estimating $\theta$.

We want maximal compression, which is a sensible way to understand the samples. All that the definition of MSufficientStat T says is if $T'(x) = T'(Y)$, then T(X) = T(Y). In order to define T as a composition of T' and a function g on the image of T', we may have to even define g in a funny way, but the point is it always exists. I guess we can think about the equivalence classes induced by a 'smaller' sufficient stat as grouping and merging the equivalence classes induced by a 'larger' sufficient stat.

0
On

If $T'\le T$, then there exists a surjective function from T to T'. For this you will need to recall what the definition of a function is. The right is not a surjective function, but the left is.

enter image description here

As an example, consider $\mathscr {Unif}(-\theta, \theta)$ with $\theta>0$. A sample has likelihood function of $f(x|\theta)=\frac{1}{2^n\theta^n}I(X_{(1)}\ge-\theta)I(X_{(n)}\le\theta)$ and so by the factorization theorem $T=\{X_{(1)}, X_{(n)}\}$ is a sufficient statistic for $\theta$. It is not minimal sufficient though, because $f(x|\theta)\propto I(-X_{(1)}\le\theta)I(X_{(n)}\le\theta)=I(\max\{-X_{(1)},X_{(n)}\}\le\theta)$. So $T'=\max\{-X_{(1)},X_{(n)}\}$ is a minimal sufficient statistic. There exists a surjective function from T to T', namely take the max of the components of T. But there is no surjective function from T' to T. In this sense you can construct T' from T but not the other way around. For this example, if you know the largest and smallest value, you know the largest absolute value. But if you know the largest absolute value, you don't necessarily know the smallest or largest value. (It could be either the largest or smallest value, and then the other will be unknown.)