ESL: Multiplying Matrix $X$ by $X^T = 2X^T$?

90 Views Asked by At

In Elements of Statistical Learning, we differentiate $RSS(β) = (y - X\beta)^T (y - X\beta)$ (equation 2.4) w.r.t to $\beta$ to get $X^T(y - X\beta)$ (equation 2.5).

According to some(link), this is because $$(y - X\beta)^T(y - X\beta) = y^T y -2\beta^T X^T y+\beta^T X^T X \beta$$ I understand how differentiating from here would lead to equation 2.5. What I don't understand is where the $-2\beta^T X^T y$ term comes from. Don't we get $-X^T\beta^T y$ and $-X\beta y$ multiplying out $(y - X\beta β)^T(y - X\beta)$? Combining those terms shouldn't result in $-2β^T X^T y$ given that $X$ isn't guaranteed to be symmetric right?

2

There are 2 best solutions below

1
On BEST ANSWER

Noting that in this context, $y$ is a $N \times 1$ column vector, $\beta$ is a $(p + 1) \times 1$ column vector, and $X$ is an $N \times (p+1)$ matrix, so $\beta^T X^T y$ is a scalar value, and as such is guaranteed to be symmetric regardless of $X$.

3
On

We should get $\beta^TX^Ty$ and $y^TX\beta$. Note that this is a scalar.

Hence $\beta^TX^Ty=y^TX\beta$.