Factorization theorem proof question

492 Views Asked by At

From Statistical Inference by Casella and Berger:

$(1)$ $T$ is a statistic for $X$,

$(2)$ $q(T(x) | \theta)$ is the pmf of $T(X)$ and $\theta$ is a parameter,

$(3) $$A_{T(x)} = \{y : T(y) = T(x)\}$,

$(4)$ There exist functions $g(t|\theta)$ and $ h(x)$ such that for each $(x, \theta)$, $f(x|\theta) = g(T(x)|\theta)h(x)$,

Then, by the definition of the pmf of $T$, $$q(T(x) | \theta) = \sum_{A_{T(x)}}g(T(y)|\theta)h(y)$$

Can someone explain how the definition of the pmf of $T$ shows this equality? I can't figure out a way to show that these two are equal.


What I figure is:

$$q(T(x) | \theta) = P_{\theta}(T(X) = T(x)) = P_{\theta}(\{w : T(X(w) ) = T(X(w_x))\}) = \sum_{w : T(X(w)) = T(X(w_x))}P_{\theta}(\{w\})$$

where $w_x$ is the value of $w$ in the sample space that gives $X(w_x) = x$, and

$$g(T(y)|\theta)h(y) = f_X(y) = P_{\theta}(X = y) = P_{\theta}(\{s : X(s) = X(s_y)\}) = \sum_{s : X(s) = X(s_y)}P_{\theta}(\{s\})$$

where $s$ is defined as above, and

$$\sum_{A_{T(x)}} = \sum_{y \in A_{T(x)}} = \sum_{y: T(y) = T(x)} = \sum_{a \in \Omega: T(X(a)) = T(X(a_x))}$$

but from here I can't see the relationship.

2

There are 2 best solutions below

0
On

For the discrete case : Let $X_i$ follow a distribution with P.M.F $P_{\theta}{(X)}$. And let $T(X)$ be a function of $X_i's$ . Now define $A_j$ as the event that $T(X_1,X_2,X_3,...X_n)=t$ for each possible combination of $(X_1,X_2,X_3,...X_n)$ such that $T(X_1,X_2,X_3...X_n)=t$. i.e according to your definition $j$ goes from $1$ to $|A_{T(x)}|$. Note that the events $A_j$ are mutually exclusive hence,

$P[T(X)=t] = P_\theta(\cup A_j) = \sum_{i}{P_\theta(A_j)} = \sum_{x : T(x)=t}{P_\theta(x)}$

Hence the result.

0
On

Here is the way I understood it. So for this direction of the proof we want to show that $$\frac{p(x|\theta)}{q(T(x)| \theta)} = \frac{P_{\theta}(X=x)}{P_{\theta}(T(X) = T(x))}$$ is constant as a function of $\theta$. We can rewrite the denominator of this fraction as $$ P_{\theta}(T({X}) = T(x)) = \sum_{A_{T(X)}}P_{\theta}(Y = y) $$ This is because there are (presumably) multiple samples that yield the same value of the statistic. So the probability of observing some value of $T(X)$ is equal to the probability of observing one of the samples that yield this value of the statistic. The set of all possible samples where this occurs is precisely $A_{T(X)}$, so we sum over the probability of observing a sample from $A_{T(X)}$. The summation follows since observing these different samples are disjoint events, as Vishaal Sudarsan noted.

Now we use our assumption that $P_{\theta}(Y = y) = g(T(y)|\theta)h(y)$. Thus, $$q(T(x)|\theta) = P_{\theta}(T({X}) = T(x)) = \sum_{A_{T(X)}}P_{\theta}(Y = y) = \sum_{A_{T(X)}}g(T(y)|\theta)h(y) $$.

Hopefully that helps.