Why are there (r-s-1) degrees of freedom in a Chi Square GoF Test for Composite Hypotheses?

102 Views Asked by At

In a Chi Square GoF Test for Composite Hypotheses, we are interested in whether the distribution of the random variable $\ X$ , which can take on a discrete set of values $\ B_1 , B_2, ...B_r$ according to probabilities $\ \mathbb{P}(X= B_1)=p_1 $ ,$\ \mathbb{P}(X = B_2)=p_2 $ , ... ,$\ \mathbb{P}(X= B_r)=p_r $, is described by a family of distributions {$\ \mathbb{P}_{\theta}: \theta\in\Theta $ }. Using a maximum likelihood estimation of our parameter $\theta$, denoted $\theta^{*}$, and utilizing notation that $\ \mathbb{P}_{\theta} (X= B_j)\equiv p_j(\theta)$, our claim is that if the $\ p_j(\theta)$ are sufficiently close to the $\ p_j$ for some $\theta\in\Theta$, then the statistic $$T_C= \sum_{j=1}^r \frac{(\nu_j-np_j(\theta^*))^2}{np_j(\theta^*)}\longrightarrow^d \chi_{r-s-1}^2$$converges to a Chi Square distribution with $\ (r-s-1)$ degrees of freedom, where $\ s$ refers to the dimension of the parameter set $\Theta$. In the paper I was reading about this test, which can be accessed here, the degrees of freedom of the Chi Square distribution to which this sum converges is stated without proof. I have already grappled with and understood Pearson's Theorem , which states that: $$T_S= \sum_{j=1}^r \frac{(\nu_j-np_j)^2}{np_j}\longrightarrow^d \chi_{r-1}^2$$in the situation where we want to know if our counts follow a specific distribution, rather than any one of a family of distributions. My question is: Is there a simple extension of Pearson's theorem which makes it easy to understand rigorously why the degrees of freedom should be $\ (r-s-1)$ in this new composite hypothesis case?


P.S. $\nu_j$ in the above statements refers to the experimentally observed counts of $\ X_i$ that take on the value $\ B_j$. The likelihood function can therefore be written as $\ \varphi(\theta)= p_1(\theta)^{\nu_1}p_2(\theta)^{\nu_2}...p_r(\theta)^{\nu_r}$, $\ \theta^*$ maximizing the value of this function on its domain. I have used the same notation as used in the linked paper, in case I've left anything out and also for continuity of discussion.


Update: The exact proof I am searching for is mentioned here, but I cannot find the relevant document on MIT's OCW website. Any link to a valid proof would be appreciated.

enter image description here