chi-squared test statistic implementation in python

258 Views Asked by At

I am tasked with creating a script version of a spreadsheet which does overall model testing of satellite measurements for land subsidence. In one particular cell of this spreadsheet, I see a formula as follows. Although it is not explicitly stated, I suspect that the cell is calculating a chi-squared test statistic:

$$\chi^2 = \sum_{i=1}^n \frac{(z_i-z_{regr,i})^2}{\sigma_{assumed}^2}$$

I have done a bit of research into python libraries from which to generate this result because the idea of the script is to code as little as possible from scratch. Unfortunately, all online resources show that the test statistic is calculated as follows:

$$\chi^2 = \sum_{i=1}^n \frac{(z_i-z_{regr,i})^2}{z_{regr, i}}$$

Part of the idea of this spreadsheet is that the standard deviation must be a user-defined input, and I cannot reconciliate that with the above formula. Furthermore, common python methods such as scipy.stats.chisquare only have the following mathematical inputs:

f_obs : array_like
    Observed frequencies in each category.
f_exp : array_like, optional
    Expected frequencies in each category.  By default the categories are
    assumed to be equally likely.
ddof : int, optional
    "Delta degrees of freedom": adjustment to the degrees of freedom
    for the p-value.  The p-value is computed using a chi-squared
    distribution with ``k - 1 - ddof`` degrees of freedom, where `k`
    is the number of observed frequencies.  The default value of `ddof`
    is 0.

And the following outputs:

chisq : float or ndarray
    The chi-squared test statistic.  The value is a float if `axis` is
    None or `f_obs` and `f_exp` are 1-D.
p : float or ndarray
    The p-value of the test.  The value is a float if `ddof` and the
    return value `chisq` are scalars.

And never require standard deviation as input.

My question is twofold:

  • Is the first equation that I am showing correct? Why then is it different than the chi-squared test statistic provided in literature?
  • If the first equation is correct, how can I use functions such as scipy.stats.chisquare to evaluate it, namely with an assumed $\sigma$?

Thanks