Sufficient statistic for normal distribution with unknown mean and known variance

17k Views Asked by At

Let $X$ be from a normal distribution $N(\theta,1)$.

a) Find a sufficient statistic for $\theta$.
b) Is $S_n^2$ a sufficient statistic for $\theta$?

My answers

For part a)

Since the joint p.d.f is $1 \over (2\pi)^{n/2}$, $e^{{-1 \over 2}\sum(x_i-\theta)^2}$, I can say that $\sum X_i$ is a sufficient statistic for $\theta$ because $e^{{-1 \over 2}\sum(x_i-\theta)^2}$ depends on X only through the values of $\sum X_i$ right? Because if I know the value of $\sum X_i$, then I know $\sum X_i^2$ as well.

For part b)

Expanding the joint p.d.f as $\frac{1}{(2\pi)^{n/2}}e^{{-1 \over 2}\sum(x_i-\theta)^2} = \frac{1}{(2\pi)^{n/2}}e^{{-1 \over 2}\sum(x_i- \bar x + \bar x-\theta)^2} = \frac{1}{(2\pi)^{n/2}}e^{{-1 \over 2}\Big[\sum(x_i- \bar x)^2+n(\bar x-\theta)^2\Big]} = \frac{1}{(2\pi)^{n/2}}e^{{-1 \over 2}\Big[{\sum(x_i- \bar x)^2 \over n-1}n-1+n(\bar x-\theta)^2\Big]}$.

Now can I say $S_n^2$ is a sufficient statistic for $\theta$ . Is it a problem that I have $\bar x$ in the function $g(S_n^2,\theta)$?. Because $\bar x$ is a particular value I thought $g(S_n^2,\theta)$ depends on $\theta $ only through the values of $S_n^2$.

2

There are 2 best solutions below

0
On

(a) Taking your joint probability density of $\frac{1}{(2\pi)^{n/2}}e^{{-1 \over 2}\sum(x_i-\theta)^2}$, you can expand this into $$\left(\frac{1}{(2\pi)^{n/2}}e^{-\sum x_i^2 /2}\right)\left(e^{-n\theta^2/2+\theta \sum x_i }\right)$$ where the left part does not depend on $\theta$ and the right part is a function of $\theta$ and $\sum x_i$, implying by Fisher's factorisation theorem that $\sum x_i$ is a sufficient statistic for $\theta$

(b) $S_n^2$ (you do not say, but presumably the sample variance, or possibly the sample second moment about $0$ or perhaps $\sum x_i^2$) is not a sufficient statistic for $\theta$. One way of seeing this is that multiplying all the $x_i$ observations by $-1$ would not change $S_n^2$, and so it cannot give any information to distinguish between the population mean of the original normal distribution being $\theta$ or being $-\theta$

2
On

$$ \sum_{i=1}^n (x_i - \theta)^2 = \left( \sum_{i=1}^n x_i^2 \right) -2\theta \left( \sum_{i=1}^n x_i \right) + n\theta^2 $$ Therefore $$ \exp \left( \frac {-1} 2 \sum_{i=1}^n (x_i-\theta)^2 \right) = \underbrace{ e^{-n\theta^2/2}\cdot \exp\left( \theta\sum_{i=1}^n x_i \right)}_{\large\text{first factor}} \cdot \underbrace{ \exp\left( \frac{-1} 2\sum_{i=1}^n x_i^2 \right) }_{ \large \text{second factor}} $$

The first factor depends on $(x_1,\ldots,x_n)$ only through $\displaystyle\sum_{i=1}^n x_i.$ The second factor does not depend on $\theta.$

Therefore by Fisher's factorization theorem, $\displaystyle\sum_{i=1}^n x_i$ is sufficient for $\theta.$

(As your question now stands, it says "known mean", but "$N(\theta,1)$" means the mean is unknown and the variance is known.)