Suppose that we have \begin{equation} (\boldsymbol y - \boldsymbol X \boldsymbol \beta)^T(\boldsymbol y - \boldsymbol X \boldsymbol \beta) \end{equation} where $\boldsymbol y$ and $\boldsymbol \beta$ are $n$ by $1$ column vectors and $\boldsymbol X$ is a $m$ by $n$ matrix. Suppose that $\boldsymbol y$ and $\boldsymbol X$ are constants.
We want to find a vector $\boldsymbol \beta$ such that the above equation is minimized. To do that, we can expand it and obtain
\begin{equation} \boldsymbol y^T \boldsymbol y - \boldsymbol y^T \boldsymbol X \boldsymbol \beta - \boldsymbol \beta^T \boldsymbol X^T \boldsymbol y + \boldsymbol \beta^T \boldsymbol X^T\boldsymbol X \boldsymbol \beta \end{equation}
and then expand it (write out all the components) and differentiate with respect to $\beta_i$ (Which is slightly messy).
Is there an easier way to do it? Specifically, are there identities in differentiating equations of the form (with respect to $\boldsymbol \beta$):
\begin{equation} \boldsymbol \beta^T \boldsymbol X^T\boldsymbol X \boldsymbol \beta \end{equation}
Using the Frobenius (inner) product for matrices, you can jot down the function, differential, and gradient $$\eqalign{ f &= (X\beta-y):(X\beta-y) \cr df &= 2\,(X\beta-y):X\,d\beta = 2X^T(X\beta-y):d\beta \cr \frac{\partial f}{\partial\beta} &= 2X^T(X\beta-y) \cr }$$ Then set the gradient to zero and solve for $\beta$ $$\eqalign{ \beta &= (X^TX)^{-1}X^Ty \cr }$$