I am learning sufficient statistics, and the general idea seems to be straightforward but it turns out to be really confusing when I take a closer look (and it is probably because of the conditional probability). The big picture is that, for a specific family of distribution, a sufficient statistic contains all the information about a corresponding parameter the observed data provides. Let's consider $X_1$ and $X_2 \overset{\text{iid}}{\sim}$Poi($\lambda$) with an unknown $\lambda$. One can show that: $$P(X_1 = x_1 | X_1 + X_2 = t) = \binom{t}{x_1}\dfrac{1}{2^{x_1}} \dfrac{1}{2^{t-x_1}}$$ so $X_1 + X_2$ is a sufficient statistic by definition. This is a good and concrete example but I am still confused with some details.
- In terms of estimating the parameter $\lambda$, it makes sense to say that $X_1$ is insufficient because both $X_1$ and $\frac{X_1 + X_2}{2}$ are unbiased estimators but the second one has smaller variance so that it contains less uncertainty and then more information about the true parameter. Is this a correct intuition?
- From (1), we know $X_1 + X_2$ contains more information, but I don't see why it contains all the information about $\lambda$. Clearly, once we know the sum, we know the conditional distribution, which is a binomial distribution in this case. Is this what "contains all the information" means since we can generate the exact same data based on the statistic without knowing anything about the parameter?
- I don't see how "contains all the information" is helpful for the inference of the true parameter. In this example, the conditional distribution is not even a Poisson distribution, so I don't know why it helps with the inference.