In oil and gas exploration/development it is common to use acustic impedance derived from reflection seismic surveys to predict the porosity measured in wells drilled in the reservoir.
I often use tables such as the one below (from a paper) to test for spuriousness of correlation:
0.87 0.78 0.72 0.67 0.63 0.57 0.49 0.39 0.32
0.75 0.58 0.47 0.40 0.34 0.25 0.16 0.09 0.05
0.62 0.40 0.28 0.20 0.15 0.08 0.03 0.00 0.00
0.50 0.25 0.14 0.08 0.05 0.02 0.00 0.00 0.00
0.39 0.14 0.06 0.02 0.01 0.00 0.00 0.00 0.00
0.28 0.07 0.02 0.01 0.00 0.00 0.00 0.00 0.00
0.19 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
where values in the table give the probability of observing the absolute value of the sample correlation coefficient, r, being greater than some constant R, given the true correlation (ρ) is zero, in other words the probability of a spurious correlation.
These values are calculated with the expression (in both papers):
p=Pr(|r|≥R)=|t|≥((R√(n-2))/√(1-R^2 ))
where n is the sample size, or the number of locations (wells) where both reservoir property (porosity) and seismic attribute (acoustic impedance) are available, and t is distributed as a Student's t- critical value, with n-2 degrees of freedom.
For the columns in this table n is, respectively:
5 10 15 20 25 35 50 75 100
For the rows, R (the magnitude of the spurious sample correlation) is, respectively:
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
This table is used to assess the chance that the sample correlation, r, is actually false or uncorrelated with the reservoir property being predicted. Quoting again:
For example, given 5 wells, and an r = 0.7, there is,a 19% probability that
the correlation is false.
I've used this method for years and (I thought) I understood well the theory and application.
However I recently read this statement on a paper that expands on the original paper where the table was published:
...however there is another aspect of the correlation coefficient that should be considered — the confidence limits of the true correlation coefficient. For this example, the 95% confidence limits are from a minimum r of -0.48 (P97.5) and a maximum r of 0.98 (P2.5). Because the minimum r is negative, we cannot say with confidence that there is any correlation and we should reject this attribute as a predictor. Considering one seismic attribute and a sample correlation of 0.7, we need 9 samples before the minimum r is positive, but its value is only 0.07, with a 4% chance that the correlation is false.
This is my question: where is this coming from. Neither the original paper nor the recent one one published the data at each well, just the tables, so how can the author of the latter estimate the 95% confidence limits?
All I could think of is bootstrapping the 95% confidence interval around the mean r ... except that even for that they'd need at least the one sample (the 7 wells) to get the mean.
Is there any other way to get at that just using the values in the table?
I found an explanation with explained working example on this site: http://www.tc3.edu/instruct/sbrown/stat/correl.htm
They even have an excel spreadsheet.
Here's the workign example:
So using their spreadsheet I get for my example of n=5 and r=0.7
So the paper was correct in givingthe an interval for r of -0.48 (P97.5) and 0.98 (P2.5).
Also, at 8 wells the minimum r is still negative:
and at 9 it becomes positive, but only 0.067
so correct again.