Unclear reasoning on why orthogonality implies non-correlation/independence in linear regression

59 Views Asked by At

in courses, I have come across the reasoning below at least twice, but I don't understand it. It's in the context of linear regression.

enter image description here

e is a raw residual. I don't know what is meant by "assumption of linearity". I agree that, denoting H the projection matrix on the column space of X, the matrix of covariates, we have: $$X^Te=(e^TX)^T=(y^T(I-H)X)^T=0$$

How does this relate to correlation? I assume empirical correlation is meant, because the matrix X is fixed and not a random variable.

2

There are 2 best solutions below

0
On BEST ANSWER

The assumption of linearity means we assume the linear model $$y=X \beta + \epsilon$$ where $X \in \mathbb{R}^{n \times p}$ is observed and fixed, the coefficients $\beta \in \mathbb{R}^p$ are unobserved and fixed, and the noise term $\epsilon \in \mathbb{R}^n$ is unobserved and random. We do observe the response $y \in \mathbb{R}^n$, but it's random thru its dependence on $\epsilon$.
In short, the assumption of linearity basically means $y$ depends linearly on $X$.
An OLS estimator $\hat{\beta}_{OLS}$ of $\beta$ is defined to be any vector $b \in \mathbb{R}^p$ that minimizes $|y-Xb|^2$. Then we define the residual vector $e:=y-X \hat{\beta}_{OLS}$. Your statement and derivation that $X^\top e= \mathbf{0}_p$ is correct.
As for why this relates to correlation, note each column $X_j$ of $X$ corresponds to a feature, which are rows of $X^\top$: $$X^\top = \begin{pmatrix}X_1^\top \\ \vdots \\ X_p^\top \end{pmatrix}$$ Here we actually need 1 more assumption that the intercept is a feature, eg $X_1= \mathbf{1}_n$. Then $X^\top e= \mathbf{0}_p$ implies $$0= \mathbf{1}_n^\top e= \sum_{i=1}^ne_i$$ In other words, the residuals have mean zero.
Now consider any other feature, eg $X_2$. Its sample covariance with the residuals is defined as $$\text{Cov}(X_2,e):= \frac{1}{n-1}\sum_{i=1}^n(x_{i2}- \bar{x}_2)(e_i- \bar{e})$$ Since the residuals have zero mean, $\bar{e}=0$ so $$\text{Cov}(X_2,e)= \frac{1}{n-1}\sum_{i=1}^n(x_{i2}- \bar{x}_2)e_i= \frac{1}{n-1}\big\{ \sum_{i=1}^nx_{i2}e_i- \bar{x}_2 \sum_{i=1}^ne_i \big\}$$ Again using $\sum_{i=1}^ne_i=0$, we have $$ \text{Cov}(X_2,e)= \frac{1}{n-1}\sum_{i=1}^nx_{i2}e_i= \frac{1}{n-1}X_2^\top e =0$$ Since zero covariance is the same as being uncorrelated, this shows how the dot product relates

5
On

I may have misunderstood you question, but take this as a sort of proof of why this is so:

  1. Assume you have two lines, A and B, which are orthogonal
  2. Change your coordinate system so that A lines up with the X axis
  3. Now B will end up lining up with Y or Z
  4. This means that A is a function purely of X, and B of Z/Y, so they depend on completely different variables, hence they are uncorrelated