Differentiating matrices with respect to a vector

152 Views Asked by At

Given a matrix $X$ (which doesn't need to be square) and a vector $b$, how can I get the following equality? $$\frac{b^t X^t X b}{\partial b} = (X^t X ) b $$

Why is this wrong? $$\frac{b^t X^t X b}{\partial b} = \frac{(X b )^t X b}{\partial b} = \frac{( X b )^t}{\partial b} X b + (X b)^t\frac{X b}{\partial b} = X^t X b + (X b)^t X $$

Also, how can I directly calculate this without multiplying the terms in the numerator? $$ \frac{(y-X b)^t (y-X b) }{\partial b}$$

2

There are 2 best solutions below

0
On BEST ANSWER

It's known that $$ \frac{\partial x^tAx}{\partial x}=(A+A^t)x $$ (see for example here).

Thus, $$\begin{align} \frac{\partial b^tX^tXb}{\partial b}&=(X^tX+(X^tX)^t)b &\\ &=2X^tXb &\text{if $X^tX$ is symmetric} \end{align} $$ For $$ (y - Xb)^t(y - Xb) = y^ty - y^tXb - b^tX^ty + b^tX^tXb = y^ty - 2b^t(X^ty) + b^t(X^tX)b $$ we have $$ \frac{\partial (y - Xb)^t(y - Xb) }{\partial b}= -2X^ty + 2(X^tX)b, $$ where we have used the fact that $X^tX$ is symmetric.

3
On

Neither of those results is quite right. The derivative of $f = b^T X^T X b$ is

$$ \eqalign { \frac {\partial f} {\partial b} &= 2 X^TXb \cr } $$

To derive this, express the function in terms of the Frobenius product and rearrange the differential until you isolate $db$ on the RHS.

$$ \eqalign { f &= Xb:Xb \cr \cr df &= 2(Xb):d(Xb) \cr &= 2Xb:X db \cr &= 2X^TXb:db \cr }$$ You can also get to the same result using index notation. You'll end up with an expression with two terms, $(b_iX^T_{ij}X_{jk}{db}_k + {db}_iX^T_{ij}X_{jk}b_k)$.

Then you just have to remember that $X^TX$ is symmetric so that $X^T_{ij}X_{jk} = X^T_{kj}X_{ji}$, which allows you to combine the two terms.

For your second question, anytime you have a function which is the product of 2 identical terms, i.e. $f = w:w$, then the derivative/differential is of the form $df = 2w:dw$. This result was applied in the preceding derivation.

Updated

A quick review of the algebra of Frobenius products might make the above answer less "incomprehensible". It's nothing too deep, and flows easily from the definition, $$ A:B \equiv {\rm tr}(A^TB) $$ Just as there are mixed-product rules for Kronecker products $$ \eqalign { (AB)\otimes(XY) &= (A\otimes X)(B\otimes Y) \cr }$$ there are mixed-product rules for Frobenius products $$ \eqalign { (AB):(X) &= (A):(XB^T) \cr &= (B):(A^TX) \cr } $$ Basically you can move a matrix to the opposite side of the Frobenius product if you transpose it, and retain its relative position (RHS or LHS) on the other side.

Similar to the rule for transposing Kronecker products $$ \eqalign { A^T\otimes B^T &= (A\otimes B)^T \cr } $$ there's a rule for Frobenius products $$ \eqalign { A^T:B^T &= (A:B)^T \cr } $$

Frobenius products are also commutative, distributive, and follow the standard product rule under differentiation $$ \eqalign { A:B &= B:A \cr A:(B+C) &= (A:B) + (A:C) \cr d(A:B) &= (dA:B) + (A:dB) \cr } $$ which makes algebraic manipulations quite simple. For example, $$ \eqalign { d(w:w) &= (dw:w) + (w:dw) \cr &= (w:dw) + (w:dw) \cr &= 2 w:dw \cr } $$