Why is the degree of freedom not 5?

174 Views Asked by At

So we got out homework back and along with it the answers to the questions. I understood that the degree of freedom Can be obtained with N-1. So we have 6 sets of data then N =6. So df =5. But here it says it's 3. Can someone explain why?

2

There are 2 best solutions below

0
On BEST ANSWER

So we have 6 sets of data then N =6. So df =5

From Wiki:

In general, the degrees of freedom of an estimate of a parameter are equal to the number of independent scores that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself (most of the time the sample variance has N − 1 degrees of freedom, since it is computed from N random scores minus the only 1 parameter estimated as intermediate step, which is the sample mean).

In your particular case, the allele freqyency is being estimated from the data and using that estimate the expected Genotype frequency is computed.

Ex -For $A$, the computation is $0.0987 + 0.1624/2 + 0.1752/2 = 0.2675$, similarly done for $B$ and for $C$ using $C = 1 - A-B$. And, these frequencies are used to get the expected frequencies which is in the next column.

So, two more parameters which are the frequencies of allele $A$ and $B$ have been computed, so $d.o.f$ is $6-1-2 = 3$.

4
On

The basic rule is that "degrees of freedom" is "number of variables" - "number of constraints".

You are used to having a "known background population". For example, you might know that each genotype has a 1/6 frequency in the general population. If you knew that, then there would be 5 degrees of freedom. However, you don't know that.

In this problem, the 0.2675 frequency of allele A is not known in advance. It's being estimated from the data. We can see that 9.87% of the time 100% of the alleles are A (AA genotype) and 16.24%+17.52% of the time 50% of the alleles are A (AB and CA genotypes, respectively).

9.87% + .5*(16.24%+17.52%) = 26.75%, hence the allele frequency for A. The allele frequencies for B and C are calculated the same way.

The expected frequencies are then calculated from the allele frequencies. If every genotype was just two independent random alleles (as per the null hypothesis), then the frequency of genotype AA would be (26.75%)² = 7.16%. Hence, the expected frequency of the genotype AA.

What does all this mean? It means that the expected frequencies are determined using the experimental data, and you need to account for those constraints. In this context, two allele frequencies (let's arbitrarily say A and B) were computed from the data as described above. The allele C frequency isn't really computed the same way, since it's just 1 - the allele A and B frequencies, so overall that's 2 constraints for allele frequencies.

As a trivial example, consider having only 2 alleles. You'd have 3 genotypes (AA, AB, BB). However, you'd really just have one degree of freedom: is AB over-represented (A and B occur together more than you'd expect) or under-represented (A and B occur together less than you'd expect)?