Sufficient statistics function for $N(\theta, c\theta^2)$ and symmetrical confidence interval using $\bar{X}$

1.2k Views Asked by At

Exercise:

Let $X_1, \dots, X_n$ be a random sample from the Normal Distribution $N(\theta,c\theta^2)$ where $c > 0$ is a known constant and $\theta \in \mathbb R$ an unknown parameter.
i) Find a sufficient statistics function for $\theta$.
ii) Using only the statistics function $\bar{X}$, construct a $100(1 - a)\%$ confidence interval for $\theta$.

Attempt:

i)\begin{align*}p(x \mid c,\theta) &= \prod_{i=1}^n(2\pi c\theta^2)^{-1/2}\exp\big\{-(x_i-\theta)^2/(2c\theta^2)\big\}\\ &=(2\pi c\theta^2)^{-n/2}\exp\bigg\{-\frac{n}{2c\theta^2}\sum_{i=1}^n(x_i-\theta)^2\bigg\}\\ &=(2\pi c\theta^2)^{-n/2}\exp\bigg\{-\frac{n}{2c\theta^2}\bigg(\sum_{i=1}^nx_i^2 -2\theta\sum_{i=1}^nx_i+n\theta^2\bigg)\bigg\}. \end{align*} Thus, we can continue and figure out a sufficient statistics function by Fisher's factorization theorem.

(ii) How would one proceed by finding a confidence interval for $\theta$ as asked though?

2

There are 2 best solutions below

11
On BEST ANSWER

After some days, I managed to work around a complete answer and I am posting it for the sake of the bounty set by Clarinetist.

$\textbf{i)}$ $$ p(x \mid c,\theta) = \prod_{i=1}^n(2\pi c\theta^2)^{-1/2}\exp\big\{ -(x_i-\theta)^2/(2c\theta^2)\big\}$$

$$=$$

$$(2\pi c\theta^2)^{-n/2}\exp\bigg\{-\frac{n}{2c\theta^2}\sum_{i=1}^n(x_i-\theta)^2\bigg\}$$

$$=$$

$$(2\pi c\theta^2)^{-n/2}\exp\bigg\{-\frac{n}{2c\theta^2}\bigg(\sum_{i=1}^nx_i^2 -2\theta\sum_{i=1}^nx_i+n\theta^2\bigg)\bigg\}$$

$$=$$

$$(2\pi c \theta^2)^{-n/2}\exp\bigg\{-\frac{n}{2c\theta^2}\sum_{i=1}^nx_i^2+\frac{n}{c\theta}\sum_{i=1}^nx_i - \frac{n^2}{2c}\bigg\}$$

$$(2\pi c \theta^2)^{-n/2}\exp\bigg\{ -\frac{n}{2c\theta^2}\sum_{i=1}^nx_i^2+\frac{n}{c\theta}\sum_{i=1}^nx_i \bigg\}\cdot \exp\bigg\{-\frac{n^2}{2c}\bigg\}$$

Recall that the Fisher-Neyman Factorization Criterion mentions that if the probability function $p(\mathbf{x}\mid\theta)$ can be written as $p(\mathbf x\mid \theta)=G(\mathbf t,\theta)H(\mathbf x)$ where $\mathbf t(x) = (t_1(\mathbf x), \dots, t_k(\mathbf x))^\mathbf T$, then the function $\mathbf t(\mathbf x)$ is sufficient for the parameter $\theta$ over the statistics model $\{ X, \mathcal X, p(x\mid\theta), \theta \in \mathcal \Theta\}$.

For our specific case, consider the functions :

$$G(\mathbf t, \theta) =(2\pi c \theta^2)^{-n/2}\exp\bigg\{-\frac{n}{2c \theta^2}t_2(x) + \frac{n}{c\theta}t_1(x)\bigg\}$$

$$H(\mathbf x) = \exp\bigg\{-\frac{n^2}{2c}\bigg\}$$

Truly then, our probability function can be written as the product of these two, with $t_1(x)$ and $t_2(x)$, such that :

$$t_1(\mathbf x) = \sum_{i=1}^n x_i, \quad t_2(\mathbf x) = \sum_{i=1}^n x_i^2$$

Note that $c$ is a known constant, $c>0$ and that's why we can apply the Neyman-Fisher Factorization Criterion with it being in the expressions.

Thus, a sufficient statistics function for the given distribution model, is :

$$\mathbf t(x) = (t_1(\mathbf x), t_2(\mathbf x))^\mathbf T=\bigg(\sum_{i=1}^n x_i, \sum_{i=1}^n x_i^2\bigg)^\mathbf T$$

$\textbf{ii)}$

For a random sample $\mathbf X = (X_1, \dots, X_2)^\mathbf T$ from $\{\mathbf X, \mathbb R, \mathbf N(μ,σ^2),(μ,σ) \in \mathbb R \times \mathbb R^+\}$, we have :

$$T = \frac{\bar{X}-μ}{S/\sqrt{n}} \sim \mathbf{St}(n-1)$$

Thus, it is possible to find an interval $(c_1,c_2) \subset \mathbb R$, such that :

$$\mathbb P\bigg[c_1 < \frac{\bar{X}-μ}{S/\sqrt{n}} < c_2 \bigg] = 1-a$$

Because the distribution $t$ of Student is symmetrical around $0$, the interval $(c_1,c_2)$ has minimum length when $-c_1=c_2=t_{n-1,a/2}$, where $t_{n-1,a/2}$ such that $\mathbb P [ T > t_{n-1,a/2}] = \frac{1}{2}\mathbb P[|T| > t_{n-1,a/2}]=a/2$ with $T\sim \mathbf{St}(n-1)$. Thus, we have with probability $\gamma = 1-a$, the relation :

$$-t_{n-1,a/2} < \frac{\bar{X}-μ}{S/\sqrt{n}} < t_{n-1,a/2}$$

from which the $100 \; \gamma \; \%$ confidence interval for the mean $μ$ will be :

$$\bar{X}-t_{n-1,a/2}S/\sqrt{n} < μ < \bar{X} + t_{n-1,a/2}S/\sqrt{n}$$

In our specific exercise, it is $μ=\theta$ and thus the $100 \; \gamma = (1-a) \; \%$ will be :

$$\bar{X}-t_{n-1,a/2}S/\sqrt{n} < \theta < \bar{X} + t_{n-1,a/2}S/\sqrt{n}$$

where we have only used the statistics function $\bar{X}$, since our expression consists of $\bar{X}$ and also $S$, which is :

$$S = \sqrt{\frac{\sum_{i=1}^n (x_i-\bar{X})^2}{N-1}}$$

3
On

$\def\Φ{{\mit Φ}}\def\d{\mathrm{d}}$Note that$$ \overline{X} \sim N(nθ, cnθ^2) \Longrightarrow \frac{\overline{X} - nθ}{\sqrt{cn} θ} \sim N(0, 1). $$ Denote $ϕ(x) = \exp\left( -\dfrac{x^2}{2} \right)$, $\displaystyle \Φ(x) = \int_{-∞}^x ϕ(t) \,\d t$. For any $-\sqrt{\dfrac{n}{c}} < a < b$, because$$ a \leqslant \frac{\overline{X} - nθ}{\sqrt{cn} θ} \leqslant b \Longleftrightarrow \frac{\overline{X}}{b\sqrt{cn} + n} \leqslant θ \leqslant \frac{\overline{X}}{a\sqrt{cn} + n}, $$ to make a $(1 - α)$ confidence interval, it is equivalent to require that$$ \Φ\left( \frac{\overline{X}}{a\sqrt{cn} + n} \right) - \Φ\left( \frac{\overline{X}}{b\sqrt{cn} + n} \right) = α. $$ In particular, to make an unbiased confidence interval, an additional requirement is$$ \left( a + \sqrt{\frac{n}{c}} \right)ϕ(a) = \left( b + \sqrt{\frac{n}{c}} \right)ϕ(b). $$