How to figure out the respective sufficient statistic for a given vector of parameters?

82 Views Asked by At

Let $Y$ be a random sample from $N(\mu,\sigma^2)$ where both $\mu$ and $\sigma^2$ are unknown. Let $\theta$ be the vector of parameters of interest $\theta=(\mu,\sigma^2)$.

I need to find the sufficient statistic for $\theta$ and I know I need to use the factorization theorem for this. Here is what I have done:

$$ \begin{align} f_Y(y;\mu,\sigma^2) &= \left(\frac{1}{\sqrt{2\pi\sigma^2}}\right)^ne^{-\frac{1}{2\sigma^2}\sum\limits_{i=1}^n (y_i-\mu)^2} \\ & = \left(\frac{1}{\sqrt{2\pi\sigma^2}}\right)^n e^{-\frac{1}{2\sigma^2} \left(\sum\limits_{i=1}^n y_i^2 -2\mu \sum\limits_{i=1}^n y_i + n\mu^2 \right) } \\ & = b(h(y),\mu,\sigma^2)c(y) \end{align}$$

where:

  • $c(y)=1$
  • $b(h(y),\mu,\sigma^2) = \left(\frac{1}{\sqrt{2\pi\sigma^2}}\right)^n e^{-\frac{1}{2\sigma^2} \left(\sum\limits_{i=1}^n y_i^2 -2\mu \sum\limits_{i=1}^n y_i + n\mu^2 \right) }$

Now, I am not sure what to put for $h(y)$. I could say:

  1. $h(y)=\left(\sum\limits_{i=1}^n y_i,\, \sum\limits_{i=1}^n y_i^2 \right)$ is a sufficient statistic for $(\mu,\sigma^2)$.

OR

  1. $h(y)=\left( \sum\limits_{i=1}^n y_i^2 ,\, \sum\limits_{i=1}^n y_i \right)$ is a sufficient statistic for $(\mu,\sigma^2)$.

What is the approach to tell which is the correct version to write?

1

There are 1 best solutions below

2
On BEST ANSWER

Both are sufficient statistics. Or perhaps more precisely, either is the minimal sufficient statistic for this family of distributions. If you know the value of either, you can compute the value of the other without knowing $\mu$ or $\sigma^2$. You could also say the pair $\left(\bar Y, \frac 1 n \sum_{i=1}^n (Y_i-\bar Y)^2\right)$ is a sufficient statistic for $(\mu,\sigma)$, where $\bar Y=(Y_1+\cdots+Y_n)/n$.

The phrase "the sufficient statistic" is a bit misleading, implying as it does that there is only one. For example, the whole sample $(Y_1,\ldots,Y_n)$ is a sufficient statistic. But you probably want the minimal sufficient statistic, i.e. a sufficient statistic that is a function of every other sufficient statistic, where the function in no way depends on $(\mu,\sigma^2)$. And there is only one of those, and the pair $\left(\sum_{i=1}^n Y_i,\, \sum_{i=1}^n Y_i^2 \right)$ is it, or the pair $\left( \sum_{i=1}^n Y_i^2,\, \sum_{i=1}^n Y_i \right)$ is it, or the pair $\left(\bar Y, \frac 1 n \sum_{i=1}^n (Y_i-\bar Y)^2\right)$ is it. They are equivalent in that from any of them you can compute the others without knowing $(\mu,\sigma^2)$.