Derivative of $(Ax - b)^T(Ax-b)$

5.6k Views Asked by At

I am trying to take the derivative of $(Ax - b)^T(Ax-b)$ and setting it to zero without expanding the multiplication, by only using matrix calculus. I knew the partial derivative of $x^Tx$ according to $x$ is $2x$, derivative of $Ax - b$ is $A^T$ and by utilizing the chain rule, $(f o g)' = (f'(g))(g')$ I obtained,

$$ 2(Ax-b)A^T = 0 $$

or

$$ (2Ax-2b)A^T = 0 $$

However the dimensions are not lining up here, if I had

$$ 2A^T(Ax-b) = 0 $$

it would be fine. How can I get this result? Is there a rule I skipped that allows the statement above to be the derivative?

Note:

From this link I found out

$$ \frac{\partial}{\partial t} f(g(t)) = \nabla f(g(t))^T \frac{\partial g}{\partial t} $$

If I had $g(x) = Ax-b$ and $f(z) = z^Tz$ then I could have

$$ \frac{\partial}{\partial x} f(g(x)) = 2z^TA = 2(Ax-b)^TA = 0 $$

Take transpose of both sides

$$ 2A^T(Ax-b) = 0 $$

So I guess my original chain rule was wrong. Does this look correct? If yes, does anyone know where I can find the derivation of the correct Chain Rule?

1

There are 1 best solutions below

2
On BEST ANSWER

No. With vectors and matrices the order of multiplication matters. One way of seeing what's going on is to write

$$Ax = \left[ \begin{array}{ccc} a_1 & \cdots & a_n \end{array} \right] \left[ \begin{array}{c} x_1 \\ \vdots \\ x_n \end{array} \right]$$,

such that $\partial Ax / \partial x_1 = a_1$. Thus,

$$\frac{\partial (Ax-b)}{\partial x'} = A.$$

The result you're looking for is therefore

$2 A' (Ax-b)$.