How can I interpret the following matrix multiplication?

38 Views Asked by At

The following lines are from a slide of Econometrics lesson. (The topic is The Ordinary Least Squares Estimation)

"...By solving for $\widehat{\beta}$, we obtain the Ordinary Least Squares (OLS) estimator: $\widehat{\beta}=(X'X)^{-1}X'Y=(\sum_{i=1}^{n}X_{i}X_{i}')^{-1}\sum_{i=1}^{n}X_{i}Y_{i}$

Exercise: Verify that $\sum_{i=1}^{n}X_{i}X_{i}'=X'X.$You may assume $k=3$ for simplicity..."

I understand the way how we get $\widehat{\beta}$, but I don't get what does $X_{i}$ stand for? As I know $X$ is a matrix, but in the previous case: $\sum_{i=1}^{n}X_{i}X_{i}'$ definitely looks like the dot product of two vectors, which can't be, because the dot product of two vectors is a scalar, and not a $A=X'X$ matrix. What is $X_{i}$? The $i$th row(/column) of the matrix $X$? If it is indeed, I still don't understand the exercise. Someone could solve me? Thank you in advance.

1

There are 1 best solutions below

0
On BEST ANSWER

Consider the singular value decomposition of $X\in\mathbb R^{m\times n}$, i.e., $X = U\Sigma V^T$ with $U\in\mathbb R^{m\times m}$ unitary, $V\in\mathbb R^{n\times n}$ unitary and $\Sigma\in\mathbb R^{m\times n}$ diagonal. Then $$ X^TX = (U\Sigma V^T)^TU\Sigma V^T = V\Sigma^TU^TU\Sigma V^T = V\Sigma^T\Sigma V^T. $$ Let the diagonal entries of $\Sigma$ be given by $\sigma_1,\ldots,\sigma_k$ (where $k=\min\{m,n\}$). Then $\Sigma^T\Sigma = \text{diag}(\sigma_1^2,\ldots,\sigma_k^2)$ if $m\ge n$ or $\Sigma^T\Sigma = \text{diag}(\sigma_1^2,\ldots,\sigma_k^2,0,\ldots,0)$ if $n>m$. In the latter case, we set $\sigma_{m+1}=\ldots=\sigma_n = 0$. Then \begin{align} X^TX &= \big[v_1|\ldots|v_n\big]\begin{bmatrix}\sigma_1^2&&0\\&\ddots &\\0&&\sigma_n^2\end{bmatrix}\begin{bmatrix}v_1^T\\\vdots\\v_n^T\end{bmatrix} = \big[v_1|\ldots|v_n\big]\begin{bmatrix}\sigma_1^2v_1^T\\\vdots\\\sigma_n^2v_n^T\end{bmatrix} = \sum_{i=1}^n\sigma_i^2v_iv_i^T. \end{align} If we now set $X_i := \sigma_iv_i$, then $X^TX = \sum_{i=1}^nX_iX_i^T$. Now, let $\Sigma_0 = \text{diag}(\sigma_1,\ldots,\sigma_k)$. If we set $Z = U^TY$ (padded with zeros in the case $n>m$), then either ($m\ge n$) $$ X^TY = V\Sigma^TZ = \big[v_1|\ldots|v_n\big]\big[\Sigma_0\,|\,0\big]\begin{bmatrix}z_1\\\vdots\\z_m\end{bmatrix} = \big[v_1|\ldots|v_n\big]\begin{bmatrix}\sigma_1z_1\\\vdots\\\sigma_nz_n\end{bmatrix} = \sum_{i=1}^nz_i\sigma_iv_i $$ or ($n>m$) $$ X^TY = V\Sigma^TZ = \big[v_1|\ldots|v_n\big]\begin{bmatrix}\Sigma_0\\0\end{bmatrix}\begin{bmatrix}z_1\\\vdots\\z_m\end{bmatrix} = \big[v_1|\ldots|v_n\big] \begin{bmatrix}\sigma_1z_1\\\vdots\\\sigma_nz_n\end{bmatrix} = \sum_{i=1}^nz_i\sigma_iv_i, $$ where we used the above convention $\sigma_{m+1}=\ldots=\sigma_n = 0$. So, if we put $Y_i = z_i$, then $X^TY = \sum_{i=1}^nX_iY_i$. Hence, finally, $$ (X^TX)^{-1}X^TY = \left(\sum_{i=1}^nX_iX_i^T\right)^{-1}\sum_{i=1}^nX_iY_i, $$ where $X_i = \sigma_iv_i$ and $Y_i = (U^TY)_i$.