I know the solution to the linear regression function is $\beta=(X^TX)^{-1}X^TY$ and I want to know whether it can be rewritten as $X^T(XX^T)^{-1}Y$ or not and the reason. Thanks for your help.
2026-04-01 04:19:37.1775017177
On
Can we rewrite the solution to the linear regression in other form?
73 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
2
There are 2 best solutions below
0
On
If both $X^TX$ and $XX^T$ are invertible, then $$ (X^TX)^{−1}X^TY = X^{-1}(X^T)^{-1}X^TY= X^{-1}Y, $$ and, $$ X^T(XX^T)^{-1}Y = X^T (X^T)^{-1}X^{-1}Y=X^{-1}Y, $$ so in this case $\beta$ can be written in the two forms you mentioned. But, if one of $X^TX$ and $XX^T$ fails to be invertible (which in particular is the case when $X$ is not a square matrix), then the answer is no.
You are probably referring to two types of the same linear regression problem:
for Problem of type 1. (underdetermined) there are many possible solutions to the problem (there is no unique solution).
One can decide to find among these infinite solutions, a solution with lowest squared Euclidean norm and then would write the following optimization problem: \begin{equation} \begin{array}{rrclcl} \displaystyle \min_{\beta} & {||\beta||_{2}^{2}} \\ \textrm{s.t.} & X \beta & = & y \\ \end{array} \end{equation}
Writing the Lagrangian $L(\beta,\mu) = ||\beta||_{2}^{2} + \mu^{T}(y - X \beta)$
Taking the partial derivative with respect to $\beta$ and setting it to $0$: $\frac{\partial L(\beta,\mu)}{\partial \beta} = 2\beta - X^{T}\mu = 0$
Taking the partial derivative with respect to $\mu$ and setting it to $0$: $\frac{\partial L(\beta,\mu)}{\partial \mu} = y - X\beta = 0$
substituting, and assuming that $XX^{T}$ is invertible we obtain $\mu = 2(XX^{T})^{-1}y$. Plugging back into $2\beta = X^{T}\mu$ we obtain the solution $\beta = X^{T}(XX^{T})^{-1}y$ (1)
For Problem of type 2. (overdetermined) the problem does not have a solution (is inconsistent). Thus we can only hope to find a solution that is a good approximation in the sense that it is minimizing the energy of the error $J(\beta) = ||y-X\beta||_{2}^{2}$
We can formulate in this case the unconstrained optimization problem: \begin{equation} \begin{array}{rrclcl} \displaystyle \min_{\beta} & {||y-X\beta||_{2}^{2}} \\ \end{array} \end{equation}
Taking the derivative with respect to $\beta$ and setting it to $0$: $\frac{\partial ||y-X\beta||_{2}^{2}}{\partial \beta} = -2X^{T}y + 2X^{T}X\beta = 0$ we get $X^{T}X\beta = X^{T}y$
The final solution (assuming $X^{T}X$ is invertible) : $\beta = (X^{T}X)^{-1}X^{T}y$ (2)
You can now see that the two solutions (1) and (2) are answers to a regression problem, but with different setups. Thus they are not interchangeable (with the single exception when both $X^{T}X$ and $XX^{T}$ are invertible )