Partial regression coefficient calculated in two different ways

171 Views Asked by At

Consider observations on three variables $X_1,X_2$ and $X_3$ : Suppose that $X_1$ is regressed on $X_2$ : When the residual of the above regression is regressed on $X_3$; the regression coefficient of $X_3$ is $\beta_3$ : When $X_1$ is regressed on $X_2$ and $X_3$ simultaneously, the regression coefficient of $X_3$ is ${\beta_3}^{*}$. Show that $|\beta_3|\le|\beta_3^{*}|$.

The expressions are simpler if we use linear regression , still I cannot establish this result. This problem looks quite interesting. Please feel free to share your approach! Thanks!

2

There are 2 best solutions below

0
On BEST ANSWER

To make things easier, I will use $X,Y,Z$ in place of $X_1,X_2,X_3$, and assume the intercept is 0 (along with some other assumptions). The ideas should extend to more general cases.

We are given regressions:

(1) $X=aY+U$, with residual $U$.

(2) $U = X-aY = bZ+V$, with residual $V$.

(3) $X = cY+dZ+W$, with residual $W$.

We'd like to show $|b|\le|d|$.

Rewrite (2) as:

(4) $X = aY+bZ+V$

Compare (4) and (3): since for a reasonably regression, $(c,d)$ should minimize $Var(W)$, we have

(5) $Var(W)\le Var(V)$.

Similarly, compare (1) and (3): since $a$ minimizes $Var(U)$, replacing $U$ with $bZ+V$ from (2), we have

(6) $Var(bZ+V)\le Var(dZ+W)$.

Since for a reasonable regression, we have $Cov(Z,V)=Cov(Z,W)=0$ (otherwise we would have correlation unaccounted for by the coefficients) we can deduce from (6):

(7) $b^2Var(Z)+Var(V) \le d^2 Var(Z)+Var(W)$

With (5) we arrive at the desired result.

0
On

Assume the samples $X=\{x_i\}$ and $Y=\{y_i\}$ independent, and the sample $Z = \{z_i\}$ normalized.

Assume $n_i$ the standard gaussian noise sample.

Let us consider the additive regression model with the centered and normalized functions $f(x),g(y)$ in the form of $$\begin{cases} z_i = af(x_i)+bg(y_i)+cn_i\hspace{300mu}(1.1)\\ \sum\limits_{i=1}^k f(x_i) = \sum\limits_{i=1}^k g(y_i) = \sum\limits_{i=1}^k z_i = 0\hspace{260mu}(1.2)\\ \sum\limits_{i=1}^k f^2(x_i) = 1,\quad \sum\limits_{i=1}^k g^2(y_i) = 1,\quad \sum\limits_{i=1}^k f(x_i)g(y_i)\simeq0\hspace{100mu}(1.3)\\ \sum\limits_{i=1}^k z^2_i = 1,\quad \sum\limits_{i=1}^k f(x_i)z_i = \beta_X^*,\quad \sum\limits_{i=1}^k g(y_i)z_i = \beta_Y^*\hspace{136mu}(1.4)\\ \sum\limits_{i=1}^k f(x_i)n_i\simeq0,\quad \sum\limits_{i=1}^k g(y_i)n_i\simeq0,\quad \sum\limits_{i=1}^k n_i^2\simeq1.\hspace{140mu}(1.5) \end{cases}$$ In accordance with the model, conditions $(1.4)$ can be presented in the form of $$a^2 + b^2 + c^2 \simeq 1\quad\text{wherein}\quad a^2 \simeq \beta_X^*,\quad b^2 \simeq \beta_Y^*.$$ Therefore, the model of the signal is $$z_i = \beta_X^*f(x_i)+\beta_Y^*g(y_i) +cn_i,\quad\text{wherein} \quad \beta_Y^* \simeq 1-c^2-\beta_X^*.\tag2$$ Model of the signal after the first step of regression is $$Z' = Z - \beta_Xf(X)= (\beta_X^*-\beta_X)f(X)+\beta_Y^*g(Y)+cN,$$ where the X-term is the systematic error.

Easy to show that the systematic error reduces the Y-correlation.

I.e. $$\boxed{\phantom{\bigg|}|\beta_Y| \le |\beta_Y^*|.}$$