In multiple linear regression, the formula is $$\hat\beta=(X^TX)^{-1}X^Ty$$
but we all know that $$(AB)^{-1}=B^{-1}A^{-1}$$ thank we can say that $$(X^TX)^{-1}X^Ty$$ $$=X^{-1}(X^T)^{-1}X^Ty$$ $$=X^{-1}y$$
then
$$\hat\beta=X^{-1}y$$
right? But why don't they say like this??
-===========
In multiple linear regression, they say we can find error of $\hat\beta$ like this. $$b-\beta = (X^TX)^{-1}X^Ty-\beta$$ $$= (X^TX)^{-1}X^T(X\beta+\epsilon)-\beta$$ $$= (X^TX)^{-1}X^T\epsilon $$
And I can't understand how to jump from second line to third. If I assume that $(X^TX)^{-1}=X^{-1}(X^T)^{-1}$, it makes sense. But you can't tell if $x$ is invertible or a square. How can I get this?