Derivative of matrix equation

615 Views Asked by At

In here, we can see:

$$ \phi: \beta \mapsto \Vert y - X \beta \Vert^2 = \Vert y \Vert^2 - 2 y^T X \beta + \beta^T X^T X \beta $$

$$ \dfrac {\partial \phi} {\partial \beta} = \beta^TX^TX + X^TX\beta$$

$$ \dfrac {\partial^2 \phi} {\partial \beta^2} = 2X^TX$$

I am confused howto get the first derivative. I understand that $\Vert y \Vert^2$ is gone because it is a scalar, but I don't understand the rest.

Can someone explain it to me or redirect me to a good resource if this is a property or something?

Thank you.

2

There are 2 best solutions below

5
On BEST ANSWER

Note that \begin{align} \phi(\beta) &= (y-X\beta)'(y-X\beta)\\ &=y'y + \beta'X'X\beta - 2\beta' X'y, \end{align} where $X'X=A$ is a square matrix of order (rank) $p+1$, thus $\beta' X'X\beta = \beta' A \beta$ is quadratic form, as such you can rewrite it in the following manner $$ \beta' A \beta = \sum_j\sum_i \beta_j \beta_ia_{ij} = \sum\beta_j^2a_{jj} + 2\sum_{i < j}\beta_i \beta_j a_{ij}, $$ taking derivative w.r.t. $\beta$ you'll get $$ \frac{\partial}{\partial \beta} (\beta' A \beta) = 2\sum_j \beta_j a_{jj} + 2 \sum_{i < j} \beta_ja_{ij} = 2A\beta, $$ i.e., $$ \frac{\partial}{\partial \beta} (\beta' X'X \beta) = 2X'X\beta. $$

Note that $X'X\beta$ can be expressed as $$ X'X\beta = \sum_{j=1}^{p+1} C_j(X'X)\beta_j, $$ where $C_j(X'X)$ is the $j$-th column of $X'X$. Taking derivative w.r.t. $\beta_j$ will leave you only with the $j$-th column any time, hence, $$ \frac{\partial}{\partial \beta} (X'X\beta) = [C_1(X'X),..., C_2(X'X)] = X'X $$

2
On

Rather than expand and then differentiate, do the differentiation first.

Define the vector $z=(X\beta-y)$ and write the function as an inner product, which I'll denote with a colon for ease of typing. Then it's simple to find the differential, the gradient, and the Hessian $$\eqalign{ \phi &= z:z \cr d\phi &= 2z:dz &= 2z:X\,d\beta = 2X^Tz:d\beta \cr g=\frac{\partial\phi}{\partial\beta} &= 2X^Tz &= 2X^T(X\beta-y) \cr dg &= 2X^T dz &= 2X^TX\,d\beta \cr H=\frac{\partial g}{\partial\beta} &= 2X^TX \cr }$$