I try to follow the analytical derivation of linear regression in The Elements of statistical learning (Friedman, Tibshirani, Hastie), where they use a regression function
$$f(x) = x^T\beta $$ and want to derive the set of regression coefficients $\beta$, such that the Expected Prediction Error EPE for a random input variable $X\in\mathbb{R}^p$ and a random output variable $Y\in\mathbb{R}$ gets minimal.
The book does not show a line of thought. I found this Question on Cross Validated which states the following transformations: (corrected according to remarks in the answer there)
$$ EPE(f) = E\left[(Y-f(X))^2\right] = E\left[(Y-X^T\beta)^T(Y-X^T\beta)\right] $$ $$ \frac{\partial{EPE(f)}}{\partial \beta} = E\left[X(Y-X^T\beta)\right]$$
I do not understand the second line. It looks like only the first factor $(Y-X^T\beta)$ must be derived and the second one stays unchanged.
Shortening $g(X):=Y-X^T\beta$ and $g'(X)=\partial g/\partial \beta = X^T$ and therefore
$$EPE = E[g(X)^Tg(X)]$$
using the product rule I would have expected something like
\begin{align} \frac{\partial{EPE}}{\partial \beta} &= E\left[ g'(X)^Tg(X)+g(X)^Tg'(X) \right] \\ &= E\left[ X(Y-X^T\beta)+(Y-X^T\beta)^T X^T \right] \end{align}
Is this an error in the question at Cross Validated or does it result from some rule about differentiating inside an Expectation Value which I do not know?
Note that $g(X):=Y-X^T\beta$ is a scalar so $\frac\partial{\partial\beta}g(X)$ and $\frac\partial{\partial\beta}[g(X)^T]$ are both equal to $-X$, and notice that the two terms $X(Y-X^T\beta)$ and $(Y-X^T\beta)^T X$ are in fact the same. The Cross Validated derivation is correct, it's just missing a factor of $2$ (actually $-2$).
More directly, you can differentiate using the chain rule: $$\frac\partial{\partial\beta}(Y-X^T\beta)^2=2(Y-X^T\beta)\frac\partial{\partial\beta}(Y-X^T\beta)=2(Y-X^T\beta)(-X).$$