I am trying to understand the following part from book Fundamentals of statistical signal processing by Steven Kay.
It is written that if parameter $\theta$ is vector of dimension $p \times 1$ then we have $\textbf{s} = \textbf{H} \theta$, where $\textbf{H}$ is a known matrix of dimension $N \times p$ and hence $\textbf{s}$ will be a vector of dimension $N \times 1$.
Then the LSE is found by minimizing $$ \begin{align} J(\theta) = \sum_{n=0}^{N-1}(x[n]-s[n])^2 \label{1}\tag{1}\\ J(\theta) = (\textbf{x}-\textbf{H}\theta)^T (\textbf{x}-\textbf{H}\theta)\label{2}\tag{2} \end{align} $$ I had understood up to equation (1), but my query is that I am not getting how equation (2) is obtained.
Any help in this regard will be highly appreciated.
Assuming that $\mathbf{x}=(x[0], x[1], \ldots, x[N-1])^T$ and similar structure for $\mathbf{s}$ and given that $\mathbf{s}=\mathbf{H} \theta$, we can write the sum in (1) in a vector form.
The expression $(\mathbf{x}-\mathbf{s})^T(\mathbf{x}-\mathbf{s})$ represents the squared Euclidean norm (i.e., the sum of squared elements) of the vector $(\mathbf{x}-\mathbf{s})$. In more expanded form, this is: $$ (\mathbf{x}-\mathbf{s})^T(\mathbf{x}-\mathbf{s})=\sum_{n=0}^{N-1}(x[n]-s[n])^2 $$ which is equivalent to \eqref{1}.