Linear regression correlation coefficient significance vs. importance?

132 Views Asked by At

I ran a simple linear regression in Excel between variables x and y.

Pearson’s r is 0.3, and the p-value for the x coefficient is 0.5 - far above my alpha of 5%. Therefore, I conclude the correlation is “insignificant.”

Here’s my confusion - the correlation coefficient for my model is 0.3 so my sample data are in need correlated in that sense. It’s a weak correlation, but a correlation at that. We deem is statistically insignificant because the p-value is too high, indicating we failed to reject the null hypothesis which I assume is “the correlation between population x and population y is 0.”

Doesn’t the 0.3 correlation have some use? I am looking at financial data, so the correlation shows that my sample data are not strongly correlated. My question is this: since the correlation is insignificant, does this mean the 0.3 correlation must be completely disregarded? Is it of no use? If it is of some use, how?

1

There are 1 best solutions below

0
On BEST ANSWER

My question is this: since the correlation is insignificant, does this mean the 0.3 correlation must be completely disregarded?

The problem with p.value is that it dependent on the sample size $n$, i.e., the true correlation may be nonzero, however, if you have a noisy data and small $n$, you may fail to reject the null hypothesis. On the other hand, if your sample size is large enough even with a noisy data, the fact that the p.value is $0.5$, is a good indication that there is no linear correlation between you variables. A good reality check may be done by performing nonparametric Bootstrap. I.e., sample $N$ times $n$ pairs of data with replacement, and inspect how the estimated $r$ changes. If your $r$ falls frequently on both sides of the sign (i.e., sometimes positive and sometimes negative) then you may conclude that your point estimator of $0.3$ is merely due to chance and have no practical use.