diagonalising (I-X(X^TX)^-1X^T where X is rank n-k, a square matrix dimension nxn, into a matrix with n-k 1 along the diagonal, and then zeros.

144 Views Asked by At

For context, this query relates to the derivation of the distribution of the variance estimator in a linear regression. I’m doing this in a 3rd-year econometrics unit, and which takes a very matrix algebra heavy approach to all things linear regression, but given it is a unit ran by the economics department, they don’t really get into the linear algebra weeds. Seeking more clarity on this particular derivation, I’ve gone to the textbook the course is based on, and I understand the fully rigorous (or at least I think fully rigorous) argument presented there apart from one key bit. Let $I_n$ be the n-dimensional identity matrix and $X$ be an nxk dimensional matrix with rank k. $I_n-X(X^TX)^{-1}X^T$ is seen to be asymmetric and idempotent, and it has rank n-k. Thus, I understand by the spectral theorem there exists an orthogonal matrix $P$ such that $$P^T(I_n-X(X^TX)^{-1}X^T)P=\begin{pmatrix} \lambda_1 & 0 & ... & ... & ... & 0\\ 0 & ... & ... & ... & ... & ...\\ ... & ... & \lambda_{n-k} & ... & ... & ...\\ ... & ... & ... & 0 & ... & ... \\ ... & ... & ... & ... & ... & ... \\ 0 & ... & ... & ... & ... & 0 \end{pmatrix}$$ However, the book additionally asserts that there exists an orthogonal matrix $P$ such that $$P^T(I_n-X(X^TX)^{-1}X^T)P=\begin{pmatrix} 1 & 0 & ... & ... & ... & 0\\ 0 & ... & ... & ... & ... & ...\\ ... & ... & 1 & ... & ... & ...\\ ... & ... & ... & 0 & ... & ... \\ ... & ... & ... & ... & ... & ... \\ 0 & ... & ... & ... & ... & 0 \end{pmatrix}$$ where there are n-k 1's along the diagonal. I can only imagine it has something to do with the eigenstructure $I_n-X(X^TX)^{-1}X^T$, but I can't figure out why $I_n-X(X^TX)^{-1}X^T$ would only have 1 repeated eigenvalue which would permit this I think. Additionally, whilst trying to fill out the gaps of the derivation my econometrics lecturer gave, I came to a different result, and I can’t understand what wrong with me logic, which if true would mean the distribution that is ultimately being sought would be different then if the above result holds. We know that for some n-dimensional vectors $\underline{\hat{u}}$ and $\underline{u}$ $$\underline{\hat{u}}^T\underline{\hat{u}}=\underline{u}^T(I_n-X(X^TX)^{-1}X^T)\underline{u}$$. TheN I argue given that we know this is a scalar quantity $$\underline{\hat{u}}^T\underline{\hat{u}}=\text{trace}(\underline{\hat{u}}^T\underline{\hat{u}})=\text{tr}(\underline{u}^T(I_n-X(X^TX)^{-1}X^T)\underline{u})=\text{tr}(\underline{u}\underline{u}^T)(\text{tr}(I_n)-\text{tr}((X^TX)^{-1}X^TX))=\text{tr}(\underline{u}\underline{u}^T)(\text{tr}(I_n)-\text{tr}((I_k))=(n-k)\sum_{i=1}^n u_i^2$$ what's wrong with this manipulation of the trace?

1

There are 1 best solutions below

2
On BEST ANSWER

Regarding the eigenvalues: note that $A = I_n - X(X^TX)^{-1}X$ is not only symmetric but also, as you've said, idempotent (i.e. $A^2 = A$). With that in mind: if $\lambda$ is an eigenvalue of $A$, then there must be an associated eigenvector $x$ (so that $x \neq 0$ and $Ax = \lambda x$), which means that $$ \lambda x = Ax = A^2x = A(Ax) = A(\lambda x) = \lambda Ax = \lambda^2 x. $$ So, we have $\lambda x = \lambda^2 x$. Because $x\neq 0$, it must be that $\lambda = \lambda^2$. This means that we must have $\lambda = 0$ or $\lambda = 1$.

The more general result is that if there is a polynomial $p(x)$ such that the matrix $A$ satisfies $p(A) = 0$, then it must hold that all eigenvalues of $A$ satisfy $p(\lambda) = 0$. In the case of an idempotent $A$, this holds with the polynomial $p(x) = x^2 - x$.

Regarding your manipulation of trace: it is correct to say that $$ \text{tr}(\underline{u}^T(I_n-X(X^TX)^{-1}X^T)\underline{u})=\text{tr}[(\underline{u}\underline{u}^T)(I_n-((X^TX)^{-1}X^TX))]. $$ From there, you seem to assume that $\operatorname{tr}(\underline{u}\underline{u}^T A) = \operatorname{tr}(\underline{u}\underline{u}^T) \operatorname{tr}(A)$ (where $A$ has its earlier definition), but this does not necessarily hold.