Why is $E[(Y-f(X))^2] = E[(Y-f(X))^T(Y-f(X))]$ in derivation of the regression function f?

249 Views Asked by Bumbble Comm At 29 Mar 2026 - 7:11

I am working through Chapter 2 of The Elements of Statistical Learning (Hastie, Tibshirani, Friedman). I have problems following the authors as they analytically derive the regression coefficients $\beta$ for the regression function $f$. $x$ is a column vector of predictors. In the book, these are the equations (2.9) together with (2.15) and (2.16).

$$ f:\mathbb{R}^p\rightarrow\mathbb{R},\quad f(x) = x^T\beta $$

It's done by finding the minimum of the Expected Prediction Error $\mathrm{EPE}$

$$ \frac{\partial \mathrm{EPE}}{\partial \beta} = 0 \implies \text{Minimal EPE for this value of}\ \beta $$

This problem is discussed in a question at Cross Validated. However, there, the OP proposes the following transformations to arrive at a solution, where I don't understand the first transformation (marked by (*)). $X\in\mathbb{R}^p$ is a real valued random vector of predictors, $Y \in\mathbb{R}$ is a real valued random output variable.

Quote of question, $\bigcirc^T$ inserted as mentioned in answer by @bill_e:

$$ EPE(f) = E[(Y-f(X))^2] $$ $$ EPE(f) = E[(Y-X^T\beta)^T(Y-X^T\beta)] \tag{*}$$ [...]

Why is $(Y-X^T\beta)^2 = (Y-X^T\beta)^T(Y-X^T\beta)$ here? I always thought $A^2$ for a vector is the dot product of the vector with itself, so $A^2 = |A|\cdot|A|\cdot \cos\ 0° = |A|^2$?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 27 Sep 2017 - 7:55 BEST ANSWER

I am more used to writing $$\|A\|^2 = A^TA$$

Notice that here we have $f(X)=X^T\beta$, let $A = Y-X^T\beta$,

$$\| Y - f(X)\|^2 = \|Y-X^T\beta \|^2=\|A\|^2=A^TA=(Y-X^T\beta)^T(Y-X^T\beta)$$

Why is $E[(Y-f(X))^2] = E[(Y-f(X))^T(Y-f(X))]$ in derivation of the regression function f?

There are 1 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in SELF-LEARNING

Related Questions in LINEAR-REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions