Population versions of multiple correlation coefficients and least squares estimates

114 Views Asked by At

I'm reading an old paper (Wold and Faxer (1957)) which considers the theoretical relation $$ y=\beta_1x_1+\cdots+\beta_hx_h+\zeta $$ where $y,x_1,\ldots,x_h,\zeta$ are (scalar) random variables with $0$ means. Moreover, (a) the disturbance $\zeta$ has finite variance $\sigma^2(\zeta)$ and (b) none of the explanatory variables $x_1,\ldots,x_h$ is identically linear in the other ones. Let $b_1,\ldots,b_h$ be the least squares regression of $y$ on $x_1,\ldots,x_h$.

The paper then claims:

  1. $R_i=R_{i(1,2,\ldots,i-1,i+1,\ldots,h)}$ (the multiple correlation coefficient of $x_i$ and $x_1,\ldots,x_{i-1},x_{i+1},\ldots,x_h$) satisfies $1-R_i^2=\frac{P}{P_{ii}}>0$ (I'll define $P$'s shortly below.)
  2. $b_1-\beta_1=\sigma(\zeta)\sum_{i=1}^h\frac{r_iP_{1i}}{\sigma(x_1)P}$ where $r_i$ denotes the correlation between $x_i$ and $\zeta$.

$P$ is defined as follows: let $\rho_{ij}$ be the correlation between $x_i$ and $x_j$, then $$ P\equiv\begin{pmatrix}\rho_{11}&\cdots&\rho_{1h}\\ \vdots & \ddots & \vdots \\ \rho_{h1} & \cdots & \rho_{hh}\end{pmatrix} $$ and $P_{ij}$ denotes the $(i,j)$ cofactor of $P$.

I haven't worked much with population linear regressions so I only know $$ b=[\text{Var}(X)]^{-1}\text{Cov}(X,y)\tag{$*$} $$ where $b=(b_1,\ldots,b_h)'$ and $X=(x_1,\ldots,x_h)'$. Can someone please explain how (1) and (2) above follow? I tried to get (2) from ($*$) but I couldn't simplify. For (1), I'm completely stumped.

I will also really appreciate it if you could recommend a self-contained reference on population linear regressions. Most of the texts I have deal mostly with sample linear regressions. Thank you for your time.