Chain rule for derivative of a norm

1.2k Views Asked by Bumbble Comm At 25 Mar 2026 - 6:51

Suppose that $A$ is an $M \times N$ matrix, $x$ is an $N \times 1$ vector and $b$ is an $M \times1$ vector.

I want to compute $\frac{d}{dx}||Ax+b||^2_{2}$.

According to this link, the answer should be: $$2A^{T}(Ax+b)$$.

However, the chain rule gives me the transpose of this expression:

$$\frac{d\|Ax+b\|^2_{2}}{dAx+b}. \frac{d(Ax+b)}{dx}=2(Ax+b)^{T}A$$

Which answer is the correct one?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 12 Oct 2018 - 1:45

Both are correct as these are the same.

The difference is the definition of the derivative. Let $f: E \to F$ a function.

If the definition of the derivative is:

$$f(x+h)=f(x)+\frac{df}{dx} \cdot h +o(\|h\|)$$

then your answer is the correct one.

But if the definition is:

$$f(x+h)=f(x) + \left( \frac{df}{dx} \right)^T \cdot h +o(\| h\|)$$ it is the one of the link.

In my opinion, the first one is the more natural and the most used. But the second one is closer to the case of function from $\mathbb R$ to $\mathbb R$ and can be very useful, however rather than derivative it is often called the gradient.

Chain rule for derivative of a norm

There are 1 best solutions below

Related Questions in CALCULUS

Related Questions in DERIVATIVES

Related Questions in NORMED-SPACES

Related Questions in CHAIN-RULE

Trending Questions

Popular # Hahtags

Popular Questions