I am working through Chapter 2 of The Elements of Statistical Learning (Hastie, Tibshirani, Friedman). I have problems following the authors as they analytically derive the regression coefficients $\beta$ for the regression function $f$. $x$ is a column vector of predictors. In the book, these are the equations (2.9) together with (2.15) and (2.16).
$$ f:\mathbb{R}^p\rightarrow\mathbb{R},\quad f(x) = x^T\beta $$
It's done by finding the minimum of the Expected Prediction Error $\mathrm{EPE}$
$$ \frac{\partial \mathrm{EPE}}{\partial \beta} = 0 \implies \text{Minimal EPE for this value of}\ \beta $$
This problem is discussed in a question at Cross Validated. However, there, the OP proposes the following transformations to arrive at a solution, where I don't understand the first transformation (marked by (*)). $X\in\mathbb{R}^p$ is a real valued random vector of predictors, $Y \in\mathbb{R}$ is a real valued random output variable.
Quote of question, $\bigcirc^T$ inserted as mentioned in answer by @bill_e:
$$ EPE(f) = E[(Y-f(X))^2] $$ $$ EPE(f) = E[(Y-X^T\beta)^T(Y-X^T\beta)] \tag{*}$$ [...]
Why is $(Y-X^T\beta)^2 = (Y-X^T\beta)^T(Y-X^T\beta)$ here? I always thought $A^2$ for a vector is the dot product of the vector with itself, so $A^2 = |A|\cdot|A|\cdot \cos\ 0° = |A|^2$?
I am more used to writing $$\|A\|^2 = A^TA$$
Notice that here we have $f(X)=X^T\beta$, let $A = Y-X^T\beta$,
$$\| Y - f(X)\|^2 = \|Y-X^T\beta \|^2=\|A\|^2=A^TA=(Y-X^T\beta)^T(Y-X^T\beta)$$