Can Pearson's Correlation coefficient distinguish between $y=x$ and $y=\sqrt{x}$?

134 Views Asked by At

For a set of points $(x, y)$, I obtained a Pearson's r of $0.9936004531$.

For the same set of points, I changed them to $ ( \sqrt{x}, y) $ (took the square root of every $x$ value), and I obtained a Pearson's r of 0.9997411537, which is greater by $0.0061407006$.

These data were obtained from an Physics experiment I did, where theoretically, $ y \propto \sqrt{x}$.

How should I interpret this result?

I am leaning towards the idea that because the difference in Pearson's r is so small, we cannot conclude whether $ y \propto \sqrt{x}$ or not, rather we can only vaguely conclude that as $x$ increases, $y$ increases in some fashion.

Thank you very much for the help..

Edit: There were only five data points, precise to 3 decimal places.

The five points:

$ (5, 0.321) (10, 0.395) (15, 0.457) (20, 0.510) (25, 0.550) $

Edit 2: Graph on Excel for $(\sqrt{x},y)$ enter image description here

1

There are 1 best solutions below

2
On

Look at the Residual Plots

Pearson correlation alone is never enough to test if some model fits. It's important to see that there be no noticeable trend in the residuals. In small regions especially, linear models can work quite well (if you've taken calculus you know all nice functions are locally linear), so maybe it'd be better to expand the domain in which your checking, or maybe your data doesn't provide strong evidence yet for the quadratic fit for some experimental reason.

Importantly, though, if you look at the residuals for the linear fit, an easy tell that the linear model is not working well is if the residuals have a clear upward or downward trend, which will happen if the square law is really the case.

EDIT: Plot of the residuals.

enter image description here

You want to not only plot the image, but you want to plot the difference between observed and predicted values, the residuals. I've done that above. As you can see, a clear pattern emerges. That is, the noise does not look random. Having no clear trends in the residuals is usually a factor in linear regression tests, meaning despite the high correlation, (though it's usually better to think in terms of the $r^2$, which is a bit lower than the $r$). So, it seems quite statistically reasonable to say that a linear relationship was not observed from the data you have.

I'm betting that if you do this for the power law, a much less clear trend will emerge in the residuals, and so that linear regression might fail to be rejected by some hypothesis test, for linearity, say, making it a much better model with the data you have. It'd be better to have more data, so the relationship was more clear-cut, but it's obvious from the residuals that the failings in the linear model are due to more than noise.