Why is $E[g(X,\beta)^T\ g(X,\beta)]$ derived to $\beta$ equals $E[g'(X,\beta)\ g(X,\beta)]$? Why no product rule in deriving?

125 Views Asked by Bumbble Comm At 29 Mar 2026 - 5:27

I try to follow the analytical derivation of linear regression in The Elements of statistical learning (Friedman, Tibshirani, Hastie), where they use a regression function

$$f(x) = x^T\beta $$ and want to derive the set of regression coefficients $\beta$, such that the Expected Prediction Error EPE for a random input variable $X\in\mathbb{R}^p$ and a random output variable $Y\in\mathbb{R}$ gets minimal.

The book does not show a line of thought. I found this Question on Cross Validated which states the following transformations: (corrected according to remarks in the answer there)

$$ EPE(f) = E\left[(Y-f(X))^2\right] = E\left[(Y-X^T\beta)^T(Y-X^T\beta)\right] $$ $$ \frac{\partial{EPE(f)}}{\partial \beta} = E\left[X(Y-X^T\beta)\right]$$

I do not understand the second line. It looks like only the first factor $(Y-X^T\beta)$ must be derived and the second one stays unchanged.

Shortening $g(X):=Y-X^T\beta$ and $g'(X)=\partial g/\partial \beta = X^T$ and therefore

$$EPE = E[g(X)^Tg(X)]$$

using the product rule I would have expected something like

\begin{align} \frac{\partial{EPE}}{\partial \beta} &= E\left[ g'(X)^Tg(X)+g(X)^Tg'(X) \right] \\ &= E\left[ X(Y-X^T\beta)+(Y-X^T\beta)^T X^T \right] \end{align}

Is this an error in the question at Cross Validated or does it result from some rule about differentiating inside an Expectation Value which I do not know?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 27 Sep 2017 - 5:49 BEST ANSWER

Note that $g(X):=Y-X^T\beta$ is a scalar so $\frac\partial{\partial\beta}g(X)$ and $\frac\partial{\partial\beta}[g(X)^T]$ are both equal to $-X$, and notice that the two terms $X(Y-X^T\beta)$ and $(Y-X^T\beta)^T X$ are in fact the same. The Cross Validated derivation is correct, it's just missing a factor of $2$ (actually $-2$).

More directly, you can differentiate using the chain rule: $$\frac\partial{\partial\beta}(Y-X^T\beta)^2=2(Y-X^T\beta)\frac\partial{\partial\beta}(Y-X^T\beta)=2(Y-X^T\beta)(-X).$$

Why is $E[g(X,\beta)^T\ g(X,\beta)]$ derived to $\beta$ equals $E[g'(X,\beta)\ g(X,\beta)]$? Why no product rule in deriving?

There are 1 best solutions below

Related Questions in DERIVATIVES

Related Questions in EXPECTATION

Related Questions in SELF-LEARNING

Related Questions in LINEAR-REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions