Differentiation of the euclidean norm of $\Vert Ax+b\Vert^{2}$

5.4k Views Asked by At

Let $A$ be an $m\times n$ real matrix, $x$ an $n\times 1$ vector and $b$ an $m\times 1$ vector. I want to compute \begin{equation} \dfrac{\partial }{\partial x} \Vert Ax+b\Vert^{2}. \end{equation} First, I expanded \begin{equation} \Vert Ax+b\Vert^{2}=(Ax+b)^{T}(Ax+b)=x^{T}A^{T}Ax+2x^{T}A^{T}b+b^{T}b \end{equation} then I computed \begin{eqnarray} \dfrac{\partial }{\partial x}(x^{T}A^{T}Ax+2x^{T}A^{T}b+b^{T}b)=A^{T}Ax+x^{T}A^{T}A+2A^{T}b \end{eqnarray} but I know the above is wrong since $A^{T}Ax$ and $x^{T}A^{T}A$ does not have the same dimention. Thanks for the help.

2

There are 2 best solutions below

2
On

Rather than expanding first, do the opposite. Define a new vector $$y=Ax+b$$ and write the function in terms of this new variable and the Frobenius product (which I'll denote by a colon). This approach reduces the visual "clutter". You can then expand the results after finding the derivative.

With the Frobenius product, finding the gradient is easy and fool-proof $$\eqalign{ f &= \|y\|^2 = y:y \cr \cr df &= 2\,y:dy \cr &= 2\,y:A\,dx \cr &= 2\,A^Ty:dx\cr \cr \frac{\partial f}{\partial x} &= 2\,A^Ty \cr &= 2\,A^T(Ax+b) \cr \cr }$$ The rules for rearranging the Frobenius product $$\eqalign{ A:B &= B:A \cr A:BC &= B^TA:C = AC^T:B\cr }$$ can be derived from the familiar properties of the trace, since $$A:B={\rm tr}(A^TB)$$

0
On

I was also interested in this and I found some useful information from the matrixcookbook. Note in your case $b = -q$.

\begin{equation} \begin{aligned} f(x) &= ||Ax-q||^2 \\ &= (Ax-q)^T(Ax-q) \\ &= (x^TA^T-q^T)(Ax-q) \\ &= x^TA^TAx - x^TA^Tq - q^TAx + q^Tq \\ &= x^TA^TAx - q^TAx - q^TAx + q^Tq \\ &= x^TA^TAx - 2q^TAx + q^Tq \\ \end{aligned} \end{equation}

Now differentiate $f(x)$ with respect to $x$. To do this, make use of the formula: $\frac{d}{dx}(x^THx + c^Tx) = (H+H^T)x+c$.

\begin{equation} \begin{aligned} \frac{df}{dx} &= \frac{d}{dx}(x^TA^TAx) + \frac{d}{dx}(-2q^TAx) + \frac{d}{dx}(q^Tq) \\ &= ((A^TA)+(A^TA)^T)x + (-2q^TA)^T + 0 \\ &= 2A^TAx - 2A^Tq \\ &= 2A^T(Ax - q) \\ \end{aligned} \end{equation}

Since we used $b = -q$, then the answer becomes $\frac{df}{dx} = 2A^T(Ax+b)$