A random sample is drawn from a Bernoulli distribution with $X_i = 1 $ with unknown probability $p$ and zero otherwise. Examine whether the following statistics are sufficient for the parameter $p$ ?
- $(X_1,...,X_N)$
- $(X_1^2,[X_2+...+X_n]^2)$
My doubt: How are the above two a statistic? I am under the impression that a statistic is a function of sample i.e. a statistic $T=f(X_1,...,X_N)$. For example $Y=(X_1+...+X_n)/n$ . I can easily check whether Y is sufficient with respect to $p$ or not.
So how are the above two a statistic? And how to check their sufficiency?
A statistic is a function of the sample that does not depend on any unknown parameters of the distribution(s) from which the sample was drawn. This definition does not imply that such a function needs to be a scalar-valued function. It may (and often is) a vector-valued function.
Therefore, the original sample $\boldsymbol X = (X_1, \ldots, X_n)$ is a vector-valued identity function of the original sample: $\boldsymbol T = \boldsymbol f(\boldsymbol X) = \boldsymbol X$. And as such, it is tautologically a sufficient statistic, because it contains as much information about the parameter(s) that is present in the original sample.
The less trivial question is the second one: is the vector-valued statistic $\boldsymbol T = (X_1^2, (X_2 + \cdots + X_n)^2)$ a sufficient statistic? That is to say, does this ordered pair retain as much information as we can get about the parameter $p$ as the original sample? The answer to this question requires some actual mathematics; e.g., the Factorization Theorem.
But here's a hint: if you can show that $\bar X = (X_1 + \cdots + X_n)/n$ is a sufficient statistic for $p$, then if you can also show that you can express $\bar X$ as a function of $\boldsymbol T$, then $\boldsymbol T$ is also sufficient for $p$, since given $\boldsymbol T$, you can compute $\bar X$.
This then relates to the concept of a minimal sufficient statistic: if $\bar X$ is sufficient and $\boldsymbol T$ is also sufficient for $p$, clearly $\bar X$ achieves a greater degree of data reduction than $\boldsymbol T$, which in turn achieves a greater degree of data reduction for $n > 2$ than the sample itself, $\boldsymbol X$ (since the latter is not reduced at all, and $\boldsymbol T$ is only an ordered pair, and $\bar X$ is just a single scalar value). A statistic that achieves the maximum possible degree of data reduction is a minimal sufficient statistic.