Matrix equation: why can't I simplify by multiplying by the inverse matrix?

132 Views Asked by At

I know, it's standard stuff, but I could not find a good place where I could read about this. Please help me understand, or refer me to some good source where I could learn. Thank you very much!

I want to solve the equation:

$X^TX a = X^Ty$

Where $X$ is a matrix, $y$ and $a$ are vectors.

Can I solve the equation by multiplying both sides by the inverse matrix of $X^T$?

$\color{red}{(X^T)^{-1}} X^TX a = \color{red}{(X^T)^{-1}} X^Ty $ ?

$Xa = y$

$a = X^{-1}y$

Or am I doing something wrong?

I watched a course where they solved the equation to:

$a = (X^T X)^{-1}X^Ty$

but I don't understand why they didn't simplify the equation first.

2

There are 2 best solutions below

0
On

If $X$ and $X^T$ are both invertible, $$(X^TX)^{-1}X^Ty=X^{-1}(X^T)^{-1}X^Ty=X^{-1}y$$ The reason the solution has been left as $a=(X^TX)^{-1}X^Ty$ is because you might not be able to assume $X$ and $X^T$ are invertible, but if $X$ is an $n$ by $m$ matrix, $X^TX$ is invertible provided the rank of $X$ is $m$.

0
On

In the course that you watched the matrix $X$ was a tall matrix with more rows than columns. It had full rank. They wanted to solve $Xa=y$, but since the system was overdetermined, they switched to the socalled normal equation, i.e. $X^TXa=X^Ty$. By assumption, their new matrix $X^TX$ could now be inverted. This procedure solves the original system in the least squares sence.