I was reading through linear regression but I cannot get my head around with the notation.
Given a set of points $(x_1, y_1), \ldots, (x_n,y_n) \in \mathbf{R}$ the least-squares approximation is can be found solving
$$\left(\sum_{i=1}^n (x_i)^2\right)~a + \left(\sum_{i=1}^n(x_i)\right)~b = \sum_{i=1}^{n} x_i y_i$$ $$\left(\sum_{i=1}^n x_i\right)~a + n~b = \sum_{i=1}^{n} y_i$$
Up to here I understand the derivation and expressing this in matrix form is simple and looks like
$$ \begin{bmatrix} \sum_{i=1}^n (x_i)^2 & \sum_{i=1}^n(x_i) \\ \sum_{i=1}^n x_i & n \end{bmatrix} \begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} \sum_{i=1}^{n} x_i y_i \\ \sum_{i=1}^{n} y_i \end{bmatrix} $$
The problem I having is when $(x_1, y_1), \ldots, (x_n,y_n) \in \mathbf{R^n}$, how to get the previous representation to look like the normal equation
$$\hat \beta=(X^TX)^{-1}X^T y$$
where I guess $\hat \beta$ is a the vector with all the parameters of the linear regression. I don't see how to go from the sums to a pure matrix notation.
Write
$$X=\left[\begin{array}{cc} 1 & x_1 \\ \vdots & \vdots \\ 1 & x_n \end{array} \right]$$
and likewise for $y$.