Intuition of spearman rank correlation coefficient

216 Views Asked by At

I have a dataset containing a set of real numbers $X$ and a set of real numbers $Y$. I've fit a linear regression model $m$ on the $X$ variable to predict the $Y$ values. As a measure, I'm using spearman's rank correlation coefficient.
There's something weird about it however. Suppose that the correlation coefficient between the predictions of the model $m$ on $X$ and $Y$ is $a$. When I split the $X$ and $Y$ into three different sets ($X1$, $X2$, $X3$ and $Y1$, $Y2$ and $Y3$) and use the model to predict each set (resulting $Y'1$, $Y'2$ and $Y'3$) and calculate the correlation between each pair ($Y1$ and $Y'1$, etc.), I get results that are all less than $a$. I don't quite get what happens here. Shouldn't the overall correlation be something between those three?

1

There are 1 best solutions below

1
On BEST ANSWER

No, not necessarily. Suppose $Y$ is a sawtooth function of $X$ trending upwards, and each dataset $(X_i,\,Y_i)$ is one tooth of the saw, with rank correlation close to $0$. This will be less than the correlation for the pooled dataset. Pooling datasets is known to create greater correlations in some cases.