The Pearson correlation coefficient is lower for the samples than for the population. Why?

41 Views Asked by At

A population is assumed to be divided into samples such that the samples are disjoint (the intercept is zero) and the union of the samples is the population. Why can the Pearson correlation coefficient for each sample all be smaller in numerical value than the population correlation coefficient?

Note. All coefficients are positive

Example The population has two variables and we want to know what correlation the two variables have.

u--population

a_1--sample 1

a_2--sample 2

a_2--sample 3

Both the sample and the population have the two variables to which we have the correlation

The intercept between the 3 samples is zero and the union is µ.

Correlation of a_1=0.23

Correlation of a_2=0.26

Correlation of a_3=0.19

Correlation of µ= 0.28

Why can the population correlation be numerically greater than the samples correlation?

I can't find any logic in this.

A sample should at least be larger numerically, for example:

Correlation of a_1=0.23

Correlation of a_2=0.30

Correlation of a_3=0.19

Correlation of u= 0.28