Cross-correlation of identical sets: not getting expected result

62 Views Asked by At

I'm trying to work out the correlation coefficient of two sets using a given formula, but I'm not getting a perfect correlation when using identical sets.

The correlation between a client’s profile (X) and an occupational profile (Y) is given mathematically as follows:

$$r_{XY}=\frac{\sum(X-\bar X)(Y-\bar Y)}{N\sigma_X\sigma_Y}$$

where $\bar X$ and $\bar Y$ and $\sigma_X$ and $\sigma_Y$ are the means and standard deviations of $X$ and $Y$, respectively, and $N$ is the number of scores to be correlated (i.e., the number of scores constituting the client’s profile)$^*$. The correlation indexes the similarity of the shape (but not the level) between the client and occupation profiles.

*Note that $\sigma$ represents variability of the sample at hand and uses a divisor of $N$.

(http://www.onetcenter.org/dl_files/IPSF_Linking.pdf page 4)

Basically the formula assesses correlation This is what the candidate profiles look like (each candidate has six scores) and I would be choosing any as $X$ and another as $Y$:

|    | Candidate 1 | Candidate 2 | Candidate 3 | ...Candidate N
|----|-------------|-------------|-------------|------------
| A  | 1.5         | 2           | 4.33        |
| B  | 4.33        | 4.33        | 6.33        |
| C  | 1.67        | 1.67        | 1.67        |
| D  | 0.67        | 1           | 2           |
| E  | 2.67        | 3.33        | 3.33        |
| F  | 2.67        | 6           | 6.66        |

|Mean| 2.25167     | 3.055       | 4.05333     |
|S.D.| 1.27076     | 1.87769     | 2.12010     |

Other correlation measures are not an option: I must use the given formula. I'm doing this using the mean values and standard deviations above:

$$\frac {(X_A - \bar X).(Y_A - \bar Y) + (X_B - \bar X).(Y_B - \bar Y)... (X_F - \bar X).(Y_F - \bar Y)} {6\sigma_X\sigma_Y}$$

...but when I do it with identical series, the result is only ever 0.8333... and I expect a correlation coefficient of 1. Perhaps not coincidentally, 0.8333 is five-sixths of the expected value.

Any ideas where I'm going wrong?

1

There are 1 best solutions below

0
On BEST ANSWER

You are probably using an unbiased estimator of the variance, so the $\sigma^2$ you're calculating is actually 6/5 of what you would expect, as you pointed out yourself. Take a look at Wikipedia here and here.