A neat proof of equality involving projection matrices that is reminiscent of Cauchy-Schwarz

141 Views Asked by At

Trying to prove: $$ \|X^\top(I - H_0)Y\|^2= \|(H-H_0)Y\| \|(I - H_0)X\| \tag{1} $$

Here's where this expression comes from.

In a linear regression $$ Y = X\beta + \varepsilon, $$ I define two (standard) projection matrices. The projection matrix into subspace spanned by columns of the design matrix $X$: $$ H := X(X^\top X)^{-1} X^\top, $$ and projection into the one dimensional subspace spanned by vector $(1,\ldots, 1)$: $$ H_0 := \frac{1}{n}\mathbf{1} \mathbf{1}^\top. $$ (Note, one of columns of $X$, by convention, is a vector $(1,\ldots, 1)$, so we must have $HH_0 = H_0$).

According to my calculations (based on a result from linear regression, where R-sqaure equals square of sample correlation coefficient, $R^2 = r_{xy}^2$, please, see below), it must be true that:

$$ \|X^\top(I - H_0)Y\|^2= \|(H-H_0)Y\| \|(I - H_0)X\| \tag{1} $$

which I find a bit weird, it reminds me of Cauchy-Schwarz, but I couldn't decipher it in this way.

My question:

Is there an easy way (e.g., geometric, or inner product interpretation) to see why $(1)$ must be true?

The details are below.

Note Here I've asked this question on Cross Validated, now I think the question might be on the linear algebra side, so I've decided to post it here as well and cross-reference. If I make progress I'll leave out one question only to avoid duplicates and update it with an answer.



Details:

With the above projection matrices, $H, H_0$ define the standard quantities associated with a linear regression:

\begin{align} S_{YY} &:= \sum_{i=1}^n(y_i - \bar{y})^2 = \|(I - H_0)Y\|^2\,, \\ S_{XX} &:= \sum_{i=1}^n(x_i - \bar{x})^2 = \|(I - H_0)X\|^2\,, \\ S_{XY} &:= \sum_{i=1}^n(x_i - \bar{x})(y_i - \bar{y}) = \|X^\top(I - H_0)Y\|^2 = \|Y^\top(I - H_0)X\|^2\,,\\ R_{SS} &:= \sum_{i=1}^n(y_i - \hat{y}_i)^2 = \|(I-H)Y\|^2\,,\\ SS_{reg} &:= \sum_{i=1}^n(\hat{y}_i - \bar{\hat{y}_i})^2 = \sum_{i=1}^n(\hat{y}_i - \bar{{y}_i})^2 = \|(H-H_0)Y\|^2\,. \end{align}

Now, on the one hand, $$ R^2:= \frac{\sum_{i=1}^n(\hat{y}_i - \bar{\hat{y}_i})^2 }{\sum_{i=1}^n(y_i - \bar{y})^2} = \frac{SS_{reg} }{S_{YY}} = \frac{\|(H-H_0)Y\|^2}{\|(I - H_0)Y\|^2}, $$

and on the other hand

$$ r^2_{xy}:= \frac{(\sum_{i=1}^n(x_i -\bar{x})(y_i -\bar{y}))^2}{\sum_{i=1}^n(x_i -\bar{x})^2\sum_{i=1}^n(y_i -\bar{y})^2} = \frac{S_{XY}^2}{S_{XX}S_{YY}} = \frac{ \|X^\top(I - H_0)Y\|^4}{\|(I - H_0)X\|^2\, \|(I - H_0)Y\|^2}. $$

It is a well known fact that the square of sample correlation coefficient and R squared are equal, $r_{xy}^2 = R^2$, which yields that

$$ \frac{ \|X^\top(I - H_0)Y\|^4}{\|(I - H_0)X\|^2\, \|(I - H_0)Y\|^2} = \frac{\|(H-H_0)Y\|^2}{\|(I - H_0)Y\|^2}. $$ Or equivalently $$ \|X^\top(I - H_0)Y\|^2= \|(H-H_0)Y\| \|(I - H_0)X\|. $$ The last expression looks weird, it remindes me of Cauchy-Schwarz, but I was not able to "decipher" it in this way, is there an easy why to see why $(1)$ must be true?

Would appreaciate any help.

1

There are 1 best solutions below

0
On BEST ANSWER

Define the special matrices $$\eqalign{ C &= (I - H_0) \qquad&\big({\rm Centering\,Matrix}\big) \\ H &= X(X^TX)^{-1}X^T \qquad&\big({\rm Hat\,Matrix}\big) \\ }$$ Both are orthoprojectors $$\eqalign{ C^2 &= C = C^T \\ H^2 &= H = H^T \\ }$$ and $H$ has special relationships with $X$ and $H_0$ $$\eqalign{ HX &= X &\implies X^T = X^TH \\ HH_0 &= H_0 &\implies H_0 = H_0H \\ }$$ Expand their mutual product to see that they commute with each other. $$\eqalign{ HC &= (HI-HH_0) = (H-H_0) \\ CH &= (IH-H_0H) = (H-H_0) \\ }$$ Now expand the key quantity of the current problem, and evaluate its sub-multiplicative norm $$\eqalign{ Q = X^T(I-H_0)Y &= X^TCY \\ &= (X^TH)CCY \\ &= X^TCHCY \\\ \|Q\|=\|X^TC\cdot HCY\| &\le \|X^TC\|\cdot\|HCY\| \\ }$$ So my conclusion is the following inequality $$\eqalign{ \|X^T(I-H_0)Y\| \;\le\; \|(I-H_0)X\|\cdot\|(H-H_0)Y\| \\ }$$ Similarly, lots of other inequalities can be derived: $$\eqalign{ \|Q\| &\le \|(I-H_0)X\|\cdot\|Y\| \\ \|Q\| &\le \|X\|\cdot\|(I-H_0)Y\| \\ \|Q\| &\le \|(H-H_0)X\|\cdot\|Y\| \\ \|Q\| &\le \|X\|\cdot\|(H-H_0)Y\| \\ \|Q\| &\le \|(I-H_0)\|\cdot\|X\|\cdot\|Y\| \\ \|Q\| &\le \|(H-H_0)X\|\cdot\|(H-H_0)Y\| \\ }$$