I want to solve a least squares problem in the form of $\mathbf{A}\vec{x}=\vec{b}$, where $\mathbf{A}$ is a $m\times n$ matrix asociated to the transformation $T:\mathbb{R}^n \to \mathbb{R}^m$; and where $\vec{b} \notin C(\mathbf{A})$, menaing that there is no immediate solution for $\vec{x}$.
Even though $\mathbf{A}\vec{x}=\vec{b}$ has no solution, I know I can estimate the $\hat{\vec{x}}$ that would minimize the error, and that after some algebra (sorry, I have no reference at hand), it can be expressed as:
$$\mathbf{A}^{\intercal}\mathbf{A}\hat{\vec{x}} = \mathbf{A}^{\intercal}\mathbf{b}$$ My problem is that all text books I have come across stop here. THey state that the new formulation of the problem have a solution (that, I understand), but they say nothing about solving for $\hat{\vec{x}}$. Typically I would go about and solve like this:
$$\hat{\vec{x}} = (\mathbf{A}^{\intercal}\mathbf{A})^{-1}\mathbf{A}^{\intercal}\mathbf{b}$$
However, this requires that $\mathbf{A}^{\intercal}\mathbf{A}$ be invertible.
Question: is $\mathbf{A}^{\intercal}\mathbf{A}$ always invertible? If so, how can I prove it?
Edit: for the particular case I'm interested in, $\mathbf{A}$ has more rows than columns, as it comes from an over-determined system.
Edit 2: I have also checked the formulation of my problem. All columns are linearly independent, meaning that for my particular case, $\mathbf{A}$ has full rank.
By taking into account all the provided answers and (very useful) comments, I was able to refine my online search and stumbled upon this video by Khanacademy. With it, I was able to further progress until I was able to write down a proof that I myself was able to understand. Although I'm not sure if its OK to answer my own question, I wanted to post it nonetheless to (i) peer-review it and (ii) leave it here for the community.
Postulate
For any matrix $\mathbf{A}$ of size $m\times n$, the matrix multiplication of its transpose times itself, $\mathbf{A}^{\intercal}\mathbf{A}$, is invertible if $m\geq n$ and $\mathbf{A}$ has full rank. That is, if all $n$ columns in $\mathbf{A}$ are linearly independent.
Proof
Assuming that $\mathbf{A}_{m\times n}$ has full rank (i.e. all its columns are L.I.), we know that its nullspace will only contain the zero vector $\mathbf{0}_n$. $$ Rank(\mathbf{A})=n ~~\Leftrightarrow~~ N(\mathbf{A})=\{\mathbf{0}_n\} ~~\Leftrightarrow~~ \mathbf{A}\mathbf{x} = \mathbf{0}_m \text{ only for } \mathbf{x} = \mathbf{0}_n $$
Conversely, $\mathbf{A}^{\intercal}\mathbf{A}$ will also be full rank, as it will have dimensions $n \times n$ and the same rank as $\mathbf{A}$. In particular, it will have the same rank as $\mathbf{A}$ , because the matrix multiplicaiton $\mathbf{A}^{\intercal}\mathbf{A}$ will lead to a linear combination of either the rows of $\mathbf{A}$ or the columns of $\mathbf{A}^{\intercal}$ (which is equivalent), and because the column rank is equal to the row rank, meaning that the number of L.I. rows (or columns) in $\mathbf{A}^{\intercal}\mathbf{A}$ will be the same as in $\mathbf{A}$. Thus: $$ rank(\mathbf{A}^{\intercal}\mathbf{A}) = rank(\mathbf{A})=n $$
Finally, we can conclude that $\mathbf{A}^{\intercal}\mathbf{A}$ is a square matrix with linearly independent columns, a sufficient condition to be invertible.