Is my gamma calculation correct (statistics)

196 Views Asked by At

\begin{array}{|c|c|c|c|} \hline \text{Highest Degree}&\text{Don't Believe}& \text{No way to find out}&\text{Some Higher power}&\text{Believe Sometime}&\text{Believe but doubts}&\text{Know God exists}\\ \hline \text{Less than Highschool}&9& 8 & 27 & 8 & 47 & 236 \\ \hline \text{High School}&23 & 39 & 88&49 &179&706\\ \hline \text{Bachelor}&28 &48 &89 &19&104&293\\ \hline \end{array}

The Gamma measure of association is $G=\frac{n_c-n_d}{n_c+n_d}$

I got this:

$n_{c1}=236*(23+39+88+49+179+28+48+89+19+104)=157176$

$n_{c2}=47*(23+39+88+49+28+48+89+19)=18001$

...

$n_{c10}=39*(28)=1092$

Total $n_{c1}+...+n_{c10}=433960$

and

$n_{d1}=9*(39+88+49+179+706+48+89+19+104+293)=14526$

...

$n_{d10}=179*293=52447$

Total $n_{d1}+...+n_{d10}=261323$

So $\hat\lambda=\frac{433960-261323}{433960+261323}=0.28297$

1

There are 1 best solutions below

0
On BEST ANSWER

I think you have flipped $n_c$ and $n_d$ in the way you calculate Goodman and Kruskal's Gamma - or you have presented your data in an unusual ordering (see below). While the flipping of $n_c$ and $n_d$ gives you a wrong sign, your final value also is a bit wrong (I think you missed a $4$ in $0.2\mathbf{4}8297$).

Concerning the ordering convention I have oriented myself at

Sheskin, D.J. (2004) The Handbook of Parametric and Nonparametric Statistical Procedures. Chapman \& Hall/CRC, 3rd edition,

which is a very readable introduction to the subject (Chapter 32, page 1109).

It is important to notice that Goodman and Kruskal's Gamma can be used on ordinal data. This means that there is necessity for a natural ranking of the data (I'll come back to this point at the end). It is furthermore convention that the categories are organized such that we go from low to high values (for both, rows and columns), and I will work under the assumption that your data are given in this format.

Finally I'll summarize your data in a matrix $M$ (with $r=3$ rows and $c=6$ columns) given as \begin{align*} M = \begin{pmatrix} 9 & 8 & 27 & 8 & 47 & 236\\ 23 & 39 & 88 & 49 & 179 & 706\\ 28 & 48 & 89 & 19 & 104 & 293 \end{pmatrix}. \end{align*}

To calculate $n_c$ we need to find the concordant pairs for all entries and multiply them with the corresponding frequency. For example for entry $m_{1,1}$ we compute \begin{align*} n^c_{1,1} &= m_{1,1} \sum^{r}_{k=1+1} \sum^{c}_{l=1+1} m_{k,l}\\ &= 9 (39+88+49+179+706+48+89+19+104+293)\\ &= 14526 \end{align*} or generally \begin{align*} n^c_{i,j} = m_{i,j} \sum^{r}_{k=i+1} \sum^{c}_{l=j+1} m_{k,l} \end{align*} (of course we set $n^c_{i,j} = 0$ in case of an index miss-match). The numbers $n^c_{i,j}$ are therefore found by adding up all cells that are to the bottom right of $m_{i,j}$ and then multiplying it with the frequency $m_{i,j}$. Finally we have \begin{align*} n_c = \sum^{r}_{i=1} \sum^{c}_{j=1} n^c_{i,j} = 261323. \end{align*} You actually computed these numbers correctly, but you called them $n_d$ for the discordant pairs - as I said I'll stick to the usual ordering.

Now to get the values for the discordant pairs we calculate \begin{align*} n^d_{i,j} = m_{i,j} \sum^{i-1}_{k=1} \sum^{c}_{l=j+1} m_{k,l} \end{align*} (again empty sums give the value $0$). The numbers $n^d_{i,j}$ are therefore found by adding up all cells that are to the bottom left of $m_{i,j}$ and then multiplying the result with the frequency $m_{i,j}$. Finally we have \begin{align*} n_d = \sum^{r}_{i=1} \sum^{c}_{j=1} n^d_{i,j} = 433960 \end{align*} As before you computed these numbers correctly, but you associated them to the concordant pairs.

Using my numbers I get a final value of \begin{align*} G = \frac{n_c - n_d}{n_c + n_d} = \frac{261323 - 433960}{261323 + 433960} = -0.2482975, \end{align*} where I think you had a typo missing the $4$, and the sign is wrong (due to flipping $n_c$ and $n_d$).

This is as much as I can say considering this an exercise to get familiar with a statistics tool.

In terms of interpretation I would however be cautious: as I noted earlier this analysis only makes sense if it is possible to rank the data. In terms of education the ranking is clear, but ranking believes seems a bit less obvious to me - is "don't believe" bigger or smaller than "Know God exists"? You could probably get away with labeling it believe strength but even then I would advice to double check the literature to make sure this setup has a valid interpretation.

p.s. instead of doing these calculations by hand standard statistical software (such as R) can perform the calculations for you :-) (package vcdExtra for R).