chi-square test(principle used in C4.5's CVP Pruning),
also called chi-square statistics,
also called chi-square goodness-of fit
How to prove
$\sum_{i=1}^{i=r}\sum_{j=1}^{j=c}\frac{(x_{ij}-E_{ij} )^2}{E_{ij}} = \chi^2_{(r-1)(c-1)}$
where $E_{ij}=\frac{N_i·N_j}{N}$,
$N$ is the total counts of the whole datasets.
$N_i$ are the counts of the sub-datasets of the same-value of feature
$N_j$ are the counts of the sub-datasets of the same-class
please help,thanks~!
/------------------------------------------------
here are some references which are not clear:
https://arxiv.org/pdf/1808.09171.pdf (not mention why $k-1$ is used in formula(5))
https://www.math.utah.edu/~davar/ps-pdf-files/Chisquared.pdf (Not mention why $\Theta<1$ from (9)->(10))
https://arxiv.org/pdf/1808.09171 (page 4th not mention what is X*with a line on it)
http://personal.psu.edu/drh20/asymp/fall2006/lectures/ANGELchpt07.pdf (Page 109th,Not mention why $Cov(X_{ij},X_{il}=-p_ip_l)$)
The proof uses $x_{ij}\approx\operatorname{Poisson}(E_{ij})\approx N(E_{ij},\,E_{ij})$. The reason for $k-1$ is that $\sum_i N_i=N$ removes a degree of freedom. The reason for $\Theta\le 1$ is because the $\theta_i$ are probabilities.