Why does $\frac{\delta}{\delta\beta}y^TX\beta=\frac{\delta}{\delta\beta}B^TX^Ty?$

33 Views Asked by At

Why does $\frac{\delta}{\delta\beta}y^TX\beta=\frac{\delta}{\delta\beta}B^TX^Ty?$

In linear regression the parameters to the function $y=X\beta + \epsilon$ can be found by calculating the derivative of the least squared error function: $(y-X\beta)^{T}(y-X\beta)$. When calculating the derivative the final solution for $\beta$ is $\frac{X^Ty}{X^TX}$, but to get there they use the idea that $\frac{\delta}{\delta w}\vec{a}\vec{w}=\frac{\delta}{\delta w}\vec{a}^T\vec{w}$

$\frac{\delta}{\delta \beta}(y-X\beta)^T(y-X\beta)$
$=\frac{\delta}{\delta \beta}(y^Ty-y^TX\beta-\beta^TX^Ty+\beta^TX^TX\beta)$
$=\frac{\delta}{\delta \beta}(y^Ty-2\beta^TX^Ty+\beta^TX^TX\beta)$

As you can see from the 2nd to 3rd line they just simplify $\frac{\delta}{\delta \beta}-y^TX\beta-\beta^TX^Ty=\frac{\delta}{\delta \beta}2\beta^TX^Ty.$ I'm assuming it's because $\frac{\delta}{\delta\beta}y^TX\beta=\frac{\delta}{\delta\beta}B^TX^Ty?$, why is this the case?

1

There are 1 best solutions below

0
On BEST ANSWER

Note that since $y^T X\beta\in\mathbb{R}$ then its transpose equals itslef, namely, if a$\in\mathbb{R}$, thus $a^T = a$. Hence, $(y^TX\beta)^T = \beta ^T X^T y$.

Or explicitly:

Let $X$ be $n \times p$ matrix, and $\beta \in \mathbb{R}^p$. Note that $X\beta$ is a column vector of size $n$ where its $i$th element, for $i=1,...,n$, is $\sum_{j=1}^p x_{ij}\beta_j$, thus, $$ y^T(X\beta) = \sum_{i=1}^n y_i\sum_{j=1}^p x_{ij}\beta_j = \sum_{i=1}^n \sum_{j=1}^p y_ix_{ij}\beta_j. $$ On the other hand, $\beta ^T X^T$ is a raw vector of size $n$ where its $i$th entry, for $i=1,..,n$ is $\sum_{j=1}^p x_{ij}\beta_j$, thus, $$ \beta^T X^Ty = \sum_{i=1}^n\sum_{j=1}^p x_{ij}\beta_j y_i = \sum_{i=1}^n \sum_{j=1}^p y_ix_{ij}\beta_j. $$