Show that the least square estimator $\hat \beta$ for $\beta$ can be written as $\hat \beta=V D^{−1}U^TY$.

89 Views Asked by At

Consider a linear model $Y=X\beta+\varepsilon$ , where $Y,\varepsilon \in \Bbb R^n,\beta\in \Bbb R^p$ and with model matrix $X \in \Bbb R^{n×p}$ of full rank, $n, p\in \Bbb N$ with $1\lt p\le n$. Moreover consider the singular value decomposition of the model matrix X, i.e. $$\DeclareMathOperator{\diag}{diag} X=UDV^T, $$ where

  • $U$ and $D$ is $n\times p$ and a $p\times p$ diagonal matrices and
  • $V$ is a $p\times p$ matrix.

Notes

  • The diagonal elements of $D=\diag(\lambda_1,\lambda_2,\dots,\lambda_p)$ are the positive square roots of the eigenvalues of $X^TX$ or $XX^T$ and $U$ and $V$ contain normalized eigenvectors of $XX^T$ and $X^TX$ respectively, i.e.$ U^TU=I,V^TV=I.$
  • We moreover we assume that $\lambda_1\ge\lambda_2\ge \ldots\ge\lambda_p$.

Problem. Show that the least square estimator $\hat \beta$ for $\beta$ can be written as $\hat \beta=V D^{−1}U^TY$. An alternative estimator for $\beta$ is given by $\widetilde\beta:=V D^{−1}_* U^TY$, where $D^{−1}_*=\diag(\lambda^{-1}_1,\lambda^{-1}_2,\ldots,\lambda^{-1}_k,0,\ldots,0)$ for some $1\le k \lt p$. (Note: such an estimator can e.g. be useful if some covariates are highly correlated and $X^TX$ is close to singular.)

Answer: to show that the least square estimator $\hat \beta$ equals $V D^{−1}U^TY$, we start with the linear model $Y=X\beta+\varepsilon$ Using the singular value decomposition (SVD) of the model matrix $X$, $X=UDV^T$ we can rewrite the model as $$ Y=(UDV^T)\beta+\varepsilon. $$ Multiplying both sides by the transpose of $U$, we get, $$ U^TY=(U^TUDV^T)\beta+U^T\varepsilon. $$

Since $U^TU = I$ i.e. it is the identity matrix, the term $(U^TUDV^T)$ simplifies to $DV^T\beta$. Therefore, the equation becomes $$ U^TY=(DV^T)\beta+U^T\varepsilon $$ To solve for $\beta$, we multiply both sides by $V$ and get $$ VU^TY = V(DV^T)\beta+ VU^T\varepsilon. $$ Simplifying further, since $V^TV = I$, we have $$ VU^TY = D \beta+ VU^T\varepsilon. $$ Subtracting $VU^T\varepsilon$ from both sides, we get $$ VU^TY - VU^T\varepsilon = D\beta. $$ Combining the terms on the left side, we have $$ V(U^TY - U^T\varepsilon) = D\beta. $$ Since $E$ is the vector of errors, $U^T\varepsilon = 0$, so the equation simplifies to $VU^TY = D\beta$. Therefore, the least squares estimator $\beta$ equals $VD^{-1}U^TY$.

My question. Am I doing it correctly? I just need a solution verification.