Chain rule for derivative of a norm

1.2k Views Asked by At

Suppose that $A$ is an $M \times N$ matrix, $x$ is an $N \times 1$ vector and $b$ is an $M \times1$ vector.

I want to compute $\frac{d}{dx}||Ax+b||^2_{2}$.

According to this link, the answer should be: $$2A^{T}(Ax+b)$$.

However, the chain rule gives me the transpose of this expression:

$$\frac{d\|Ax+b\|^2_{2}}{dAx+b}. \frac{d(Ax+b)}{dx}=2(Ax+b)^{T}A$$

Which answer is the correct one?

1

There are 1 best solutions below

0
On

Both are correct as these are the same.

The difference is the definition of the derivative. Let $f: E \to F$ a function.

If the definition of the derivative is:

$$f(x+h)=f(x)+\frac{df}{dx} \cdot h +o(\|h\|)$$

then your answer is the correct one.

But if the definition is:

$$f(x+h)=f(x) + \left( \frac{df}{dx} \right)^T \cdot h +o(\| h\|)$$ it is the one of the link.


In my opinion, the first one is the more natural and the most used. But the second one is closer to the case of function from $\mathbb R$ to $\mathbb R$ and can be very useful, however rather than derivative it is often called the gradient.