Linear Regression with Paired Data

1.7k Views Asked by At

For a sample of paired data (x,y), t tests are performed for the slopes of the population regression lines of y on x and of x on y. The null hypothesis in both tests is $H0:β=0$.

Is it possible for the tests two have different results (one is rejected while the other fails to be rejected)?

Also, when creating a confidence interval for the slope of the population regression line for paired data (vs independent data), does the equation $b \pm t^*SEb$, or the formula for the standard error $s/s_x\sqrt {n-1}$ change?

1

There are 1 best solutions below

0
On BEST ANSWER

Correlation is a symmetrical computation. Regression of y on x is not, because it minimizes the sum of squared residuals in the y-direction, with a view toward predicting y-values from x-values.

Nevertheless, both tests of $H_0: \beta = 0$ (based on the sample estimate $\hat \beta$ in either direction) are mathematically equivalent to the test of $H_0: \rho = 0$ (based on symmetrical sample correlation $r$). Thus, the association between x and y observations, either has a sufficiently strong linear component for regression to be feasible, or not. (Note: This discussion depends on having both standard deviations $s_x$ and $s_y$ be positive.)

By contrast, the standard error used in the CI for $\beta$ depends on the 'direction' of the regression. Notice the factor $s_x$ in the formula you give in your question. Also your $s$, which I take to be $s_{y|x}$ or $s_{x|y},$ depends on the direction of the regression.

Below is output from Minitab for independently generated x and y observations. Neither of the slopes nor the correlation differs significantly from 0.

MTB > name c1 'x'
MTB > rand 5 c1;
SUBC norm 100 10.
MTB > name c2 'y'
MTB > rand 5 c2;
SUBC> norm 50 8.
MTB > desc c1 c2

Descriptive Statistics: x, y 

Variable  N  N*    Mean  SE Mean  StDev  Minimum     Q1  Median      Q3
x         5   0  100.30     4.24   9.48    91.70  92.20   96.04  110.53
y         5   0   46.42     3.78   8.44    37.96  40.02   45.31   53.39

Variable  Maximum
x          111.10
y           60.33

MTB > corr c1 c2

Correlations: x, y 

Pearson correlation of x and y = -0.218
P-Value = 0.725

Correlation not significant. P-value = 0.725

MTB > regr c1 1 c2

Regression Analysis: x versus y 

The regression equation is
x = 112 - 0.245 y

Predictor     Coef  SE Coef      T      P
Constant    111.65    29.77   3.75  0.033
y          -0.2445   0.6329  -0.39  0.725

S = 10.6881   R-Sq = 4.7%   R-Sq(adj) = 0.0%
....

In correlation of x on y, slope not signif. P-value = 0.725

MTB > regr c2 1 c1

Regression Analysis: y versus x 

The regression equation is
y = 65.9 - 0.193

Predictor     Coef  SE Coef      T      P
Constant     65.87    50.50   1.30  0.283
x          -0.1939   0.5017  -0.39  0.725

S = 9.51615   R-Sq = 4.7%   R-Sq(adj) = 0.0%
....

In correlation of y on x, slope not signif. P-value = 0.725

Data are given below for reference.

 MTB > print c1 c2

 Data Display 

 Row        x        y
   1  111.103  46.4498
   2   96.039  45.3056
   3   91.698  37.9565
   4   92.701  60.3286
   5  109.953  42.0762