I have read that the degree of freedom is calculated by subtracting $1$ from the number of states a random variable can be in. I am performing a goodness of fit test on a $64\times 32$ matrix where the expected frequency of any $a[i,j]$ is $50\,000$ and the observed frequency can lie between $0$ and $100\,000$. What I am confused about is that how do I calculate the degree of freedom? Since the observed value might range from $0$ to $100\,000$, will my degree of freedom be equal to $100\,000-1$? Please advise.
2026-04-06 20:33:45.1775507625
Determining the degree of freedom for a $\chi$-squared test
475 Views Asked by user328743 https://math.techqa.club/user/user328743/detail At
1
There are 1 best solutions below
Related Questions in STATISTICAL-INFERENCE
- co-variance matrix of discrete multivariate random variable
- Question on completeness of sufficient statistic.
- Probability of tossing marbles,covariance
- Estimate the square root of the success probability of a Binomial Distribution.
- A consistent estimator for theta is?
- Using averages to measure the dispersion of data
- Confidence when inferring p in a binomial distribution
- A problem on Maximum likelihood estimator of $\theta$
- Derive unbiased estimator for $\theta$ when $X_i\sim f(x\mid\theta)=\frac{2x}{\theta^2}\mathbb{1}_{(0,\theta)}(x)$
- Show that $\max(X_1,\ldots,X_n)$ is a sufficient statistic.
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
If you are doing a chi-squared goodness-of-fit (GOF) test for data in a matrix with $r$ rows and $c$ columns, and finding the expected count in a cell as (row total)(column total)/(grand total), then $df = (r-1)(c-1).$
Degrees of freedom depend on the numbers of row and column categories, not on the observed and expected counts in the cells.
Note: That said, I have never done a chi-squared GOF test for counts in a matrix anywhere near as large as the one you are talking about. I think you should read about the assumptions of the GOF test and make sure they apply in your situation. If you have doubts, perhaps describe your situation, data, and goals on our sister 'statistics' (or 'crossvalidated') site, and ask whether there is a better way toward your goals. That site tends to get more people with active experience in 'big data' applications.
I'm not saying you are doing the wrong analysis, but something seems to be confusing you, and I'm not sure your simply-resolved question here is the one you really should be asking.
Addendum (posted later, based on information in Comments): I had a look at the paper you linked. It is not exactly entry level material for the main subject matter, which I will not pursue here. However, I think I have a clearer view of what you are trying to do with a statistical test.
Chi-squared test. The chi-squared GOF test you propose is based on $rc = 2048$ $X$-values, each with expectation $E = 50,000.$ For purposes of the test, you essentially ignore the matrix structure because you do not use it to get $E$ (already specified for each cell). Thus, your GOF statistic turns out to be
$$Q = \sum_{i=1}^{rc} \frac{(X_i - E)^2}{E}.$$
Under the null hypothesis that $E(X_i) \equiv E$, the test statistic is approximately $Chisq(rc).$ (The distinction between $df = rc$ and $df = rc - 1$ would hardly matter in practice, but the former is correct because you are not using your $X$-values to estimate $E$, nor using the total of the $X_i$.)
An assumption of the test is that $X_i$ are approximately normal so that $Z_i = (X_i - E)/\sqrt{E}$ is approximately standard normal, $Z_i^2 = (X_i - E)^2/E$ is approximately $Chisq(df=1)$, and $Q$ is approximately $Chisq(df=rc).$ Thus one would reject $H_0$ at the 5% level, if $Q \ge 2154.4,$ the value that cuts 5% from the upper tail of $Chisq(df = rc)$.
If the $X_i$ are counts distributed $Pois(\lambda = E),$ then $E(X_i) = E,\;$ $V(X_i) = E,\,$ and $SD(X_i) = \sqrt{E}.$ Certainly, the discrete distribution $Pois(50,000)$ is well approximated by $Norm(50,000, \sqrt{50,000}).$
Normal test. A simpler and somewhat similar test (of the null hypothesis that cell means average $E = 50,000$) would use the statistic $Z = (\bar X - E)/\sqrt{E/rc},$ where $\bar X$ is the sum of the $X_i.$ Under the same assumptions as above, $Z$ is approximately standard normal. Thus, one would reject $H_0$ at the 5% level if $|Z| \ge 1.96.$
The following simulation in R of $m = 10,000$ tests of each type shows that they do have a significance level near 5%, when the $X_i \sim Pois(50,000).\;$ [A larger $m$ would get results a little closer to 5%; but not exactly, because the tests themselves are based on continuous distributions approximating discrete observations.]
The figure below on the left shows simulated values of $Q$ along with the density of $Chisq(df = rc)$; the area to the right of the vertical red line is 5%. On the right are simulated values of $Z$ along with the standard normal density curve; areas outside of the vertical red lines add to 5%.