Domain of the Fisher z-transformation

372 Views Asked by At

We apply the fisher z-transformation to our correlation matrices, such that we arrive at approximately normally distributed data.

As you might now, the Fisher z-transformation is equivalent to the function

z = 0.5*log((1+x)/(1-x)), x being pearson's correlation coefficient, ranging between -1 and 1.

Now I have been told that fishers z has a domain between -4 and 4 - something which does not make sense to me, since when approximating 1 and -1 for the x values, we arrive at much bigger or smaller values respectively.

Have I been misinformed or am I missing out on something? Thanks

1

There are 1 best solutions below

0
On

For bivariate normal data Fisher's z transformation is intended to have approximately a normal distribution with mean $\mu$ and standard deviation $\sigma$ as shown in the Wikipedia article on 'Fisher z transformation', which you should read.

A vast majority of the time, the standard normal has values in $(\mu-4\sigma, \mu+4\sigma),$ but there may be occasional exceptions. However, in the usual terminology, this would have to do with the $range$ of the transformation.

Anyhow, the interval $(-4, 4)$ may apply to many practical cases. I don't know the full context of your question, but I'm having trouble seeing how this could be a useful statement overall.

In the usual terminology the $domain$ must be $(-1, 1)$ for the values of the correlation being transformed.

So on several levels, what your source does not seem to have communicated sensible information.

Sometimes it helps to have an example. The simulation below makes $m = 100,000$ datasets of $n = 12$ points. The theoretical $\rho = .5$. Numerical descriptive statistics summarize the $m$ values of $r$ and of $s$. You can compare them with formulas in Wikipedia.

Graphs summarize the distributions of the $m$ values of $r$ and of $z$; the best-fitting normal density is shown for the second histogram; and a plot shows the domain and range of these values. For these plots the $range$ of z values extends from -1 to 2.59.

 m = 10^5; r=numeric(m);  n = 12
 for(i in 1:m) {
    u1 = rnorm(n); u2 = rnorm(n);  u3=rnorm(n)
    x = u1 + u2;  y = u2 + u3
    r[i] = cor(x, y) }
 summary(r); sd(r)
 ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 ## -0.7617  0.3428  0.5203  0.4837  0.6615  0.9888 
 ##  0.2392179
 z = .5*log((1+r)/(1-r))
 summary(z); sd(z)
 ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 ## -1.0000  0.3573  0.5768  0.5769  0.7954  2.5910 
 ##  0.3318328

enter image description here