Derivative of Quadratic form using chain rule

164 Views Asked by At

I want to differentiate $$ \frac{\partial}{\partial b} \left( [Z^{\top} (y - X b)]^{\top} W [Z^{\top}(y - X b)] \right) $$

Note that $b : K \times 1$, $y:G \times 1$, $X:G \times K$,$Z : G \times L$,$W:L \times L$.

I know that $\frac{\partial}{\partial b} A^{\top} x A = (A^{\top} + A)x$, but I can't apply that to chain rule.

How to solve that?

(This equation is needed to solve GMM in econometrics.)

2

There are 2 best solutions below

0
On BEST ANSWER

Let $B := ZWZ^T$, then it should be obvious that $$ \frac{\partial}{\partial b} \left( [Z^{\top} (y - X b)]^{\top} W [Z^{\top}(y - X b)] \right) =\frac{\partial}{\partial b} (y-Xb)^T B (y-Xb) $$ Then, expand to get: $$ \frac{\partial}{\partial b} [y^TBy - y^T BX b - b^T X B y + b^T X^T B X b ]$$ then, differentiate each term individually with respect to $b$ using standard rules as is listed on the wiki page.

0
On

$\def\p{\partial}$ Define the vector $$\eqalign{ p &= Z^T(Xb-y) \quad\implies\quad dp = Z^TX\,db \\ }$$ Write the function in terms of this new vector and calculate its gradient $$\eqalign{ \phi &= p^TWp \\&= W:pp^T \\\\ d\phi &= W:(dp\,p^T+p\,dp^T) \\&= (W+W^T):dp\,p^T \\ &= (W+W^T)p:dp \\ &= (W+W^T)p:Z^TX\,db \\ &= X^TZ(W+W^T)p:db \\ &= X^TZ(W+W^T)Z^T(Xb-y):db \\\\ \frac{\p\phi}{\p b} &= X^TZ(W+W^T)Z^T(Xb-y) \\ }$$


In the above, a colon has been used to denote the trace/Frobenius product $$\eqalign{A:B = {\rm Tr}(A^TB) = {\rm Tr}(B^TA) = B:A}$$ The properties of the trace allow terms in such a product to be rearranged in many ways, e.g. $$\eqalign{A:BC &= B^TA:C = AC^T:B = etc}$$