In the derivation of the OLS estimator, having started with the two equations $E(u_t)=0$ and $E(u_tX_t)=0$ and having written these in matrix form, we obtain:
$$ \boldsymbol{X^T} \boldsymbol{X} \boldsymbol{\beta} = \boldsymbol{X^T} \boldsymbol{y}$$
Premultiplying both sides by $ (\boldsymbol{X^T} \boldsymbol{X})^{-1} $, assuming this inverse exists, we obtain:
$$ (\boldsymbol{X^T} \boldsymbol{X})^{-1}\boldsymbol{X^T} \boldsymbol{X} \boldsymbol{\beta} = (\boldsymbol{X^T} \boldsymbol{X})^{-1}\boldsymbol{X^T} \boldsymbol{y}$$
Flushing out the inverse of a product on the left side, we get:
$$ \boldsymbol{X^{-1}} \boldsymbol{(X^T)^{-1}}\boldsymbol{X^T} \boldsymbol{X} \boldsymbol{\beta} = (\boldsymbol{X^T} \boldsymbol{X})^{-1}\boldsymbol{X^T} \boldsymbol{y}$$
Then we collapse the product of a matrix with its inverse to get the identity matrix two times on the left side.
$$ \boldsymbol{X^{-1}} \boldsymbol{I} \boldsymbol{X} \boldsymbol{\beta} = (\boldsymbol{X^T} \boldsymbol{X})^{-1}\boldsymbol{X^T} \boldsymbol{y}$$
$$ \boldsymbol{I} \boldsymbol{\beta} = (\boldsymbol{X^T} \boldsymbol{X})^{-1}\boldsymbol{X^T} \boldsymbol{y}$$
$$ \boldsymbol{\beta} = (\boldsymbol{X^T} \boldsymbol{X})^{-1}\boldsymbol{X^T} \boldsymbol{y}$$
On the right hand side, we do not apply the same rule we applied to the left side (flushing out the inverse of a product) although the right hand side could be written as $ \boldsymbol{X^{-1}} (\boldsymbol{X^T})^{-1}\boldsymbol{X^T} \boldsymbol{y}$ and the two terms in the middle appear to be collapsible into the identity matrix, leaving us with $ \boldsymbol{X^{-1}}\boldsymbol{y}$, which however appear to not be comformable to multiplication. I am guessing this is the reason this last step is never done, but I am not sure and would like to understand precisely what rule of linear algebra I am breaking in doing this last step.
The issue is the following one. If you suppose that $X^TX$ is invertible, then indeed
$$(X^TX)^{-1}(X^TX) =I$$ by definition of the inverse of $X^TX$. But not because $(X^TX)^{-1}= X^{-1}(X^T)^{-1}$: this equality even doesn’t make any sense if $X$ is not a square matrix!
This is why you’re left to keep $(X^T X)^{-1}$ on the right side of the equality.