Matrix calculus - incorrect calculation?

50 Views Asked by At

I understand that matrix derivatives involve different possible conventions but even with that in mind, I'm not sure how to show the result I have.

It is given according to this calculation that $$\nabla_x ||Ax - b||^2 = A^T(Ax - b),$$

where $x$ is a vector, $A$ is a matrix and $b$ is also a vector. I'd like to express this as

$$\frac{\partial||Ax - b||^2}{\partial x} = \frac{\partial((Ax -b)^T(Ax - b))}{\partial x}$$ and use the chain rule to get two terms

$$(Ax -b)^T\frac{\partial(Ax - b)}{\partial x} + \frac{\partial(Ax - b)^T}{\partial x}(Ax -b).$$

But now, assuming $\frac{\partial x^T}{\partial x} = I$, I seem to get $(Ax -b)^TA + A^T(Ax -b)$ so I've got an extra term in the answer!

1

There are 1 best solutions below

0
On BEST ANSWER

First, notice that $$ ||Ax-b||^2 = (Ax-b)^T(Ax-b) = (x^TA^T - b^T)(Ax-b) = x^TA^TAx - b^TAx - x^TA^Tb + b^Tb $$ But, as $x^TA^Tb$ is a real number, then $x^TA^Tb = b^TAx$. Hence, $$ ||Ax-b||^2 = x^TA^TAx - 2x^TA^Tb + b^Tb $$

Now, when taking the derivative $$ \nabla_x ||Ax-b||^2 = (A^TA + A^TA)x - 2A^Tb $$ which leads to, $$\nabla_x ||Ax-b||^2 = 2A^T(Ax - b) $$

In order to prove the derivate step the best is to do it manually, taking $$\nabla_x = \begin{pmatrix}\frac{\partial}{\partial x_1} & ... & \frac{\partial}{\partial x_n} \end{pmatrix}^T$$