Gradient of norm

6.5k Views Asked by At

How can I compute the gradient of f(x)= $||Ax-b||_{R^{-1}}^2$ I'm also confused about how to compute the gradient of g(x) = $||y-Ax||$ using the chain rule.

I think the first step to take the gradient of g(x) would be $\nabla(g(x)) = \frac{d(y-Ax)^\top}{dx}(y-Ax)+ \frac{d(y-Ax)}{dx}(y-Ax)^\top$. However, I'm unsure of how to take the derivative of (y-Ax) or the derivative of $(y-Ax)^\top$. Any help is highly appreciated.

3

There are 3 best solutions below

3
On

The gradient of the norm squared is just $\nabla||x||^2=\nabla (x_1^2+\dots +x_n^2)=(\partial/\partial x_1 (x_1^2+\dots +x_n^2),\dots,\partial/ \partial x_n (x_1^2+\dots+x_n^2))=(2x_1,\dots ,2x_n)=2 (x_1,\dots,x_n) $.

So substitute $y-Ax $ for $x $.

4
On

It is easy to see that $D(||x||^2)(x) = 2x^T$, where $D$ denotes the (total) dervative. The gradient is the transpose of the derivative. Also $D(Ax + b)(x) = A$. By the chain rule, $Df(x) = 2(Ax - b)^TA$. Thus $\nabla f(x) = Df(x)^T = 2A^T(Ax - b)$.

To compute $Dg(x)$, it will be helpful to first compute $D(||x||)(x)$. By the chain rule, \begin{align} D(||x||)(x) &= D(\sqrt{||x||^2})(x) \\ &= \frac{1}{2}(||x||^2)^{-1/2}2x^T \\ &= \frac{1}{||x||}x^T. \end{align} Now the derivative of $g$ is easy to obtain using the chain rule: $Dg(x) = \frac{1}{||y - Ax||}(y - Ax)^T(-A)$. So $\nabla g(x) = -\frac{1}{||y - Ax||}A^T(y - Ax)$.

Edit: I guess you meant $f(x) = (Ax - b)^TR^{-1}(Ax - b)$. To compute the derivative of $f$, it is convenient to first compute the derivative of $q(x) = x^TBx$, where $B$ is any matrix. For any vector $y$ we have \begin{align} q(x + y) &= (x + y)^TB(x + y) \\ &= (x^T + y^T)(Bx + By) \\ &= x^TBx + x^TBy + y^TBx + y^TBy \\ &= q(x) + x^TBy + (Bx)^Ty + O(|y|^2) \\ &= q(x) + x^T(B + B^T)y + O(|y|^2). \end{align} Hence $Dq(x) = x^T(B + B^T)$. Note with $B = R^{-1}$, $f(x) = q(Ax - b)$. Hence by chain rule, $$Df(x) = (Ax - b)^T(R^{-1} + (R^{-1})^T)A.$$ Taking the transpose gives $$\nabla f(x) = A^T((R^{-1})^T + R^{-1})(Ax - b).$$

1
On

For a vector $a$,

$$\nabla(a\cdot x)=a$$

because if you expand the dot product and differentiate on a component $x_i$, what remains is the component $a_i$ of $a$.

Now seeing the matrix $a$ as an aggregation of column vectors, $$\nabla(Ax)=A.$$

From this,

$$\nabla\|Ax-b\|^2=\nabla((Ax-b)^T(Ax-b)) \\=\nabla((Ax)^TAx-b^T(Ax)-(Ax)^Tb+b^2) \\=\nabla((Ax)^TAx-b^T(Ax)-(Ax)^Tb+b^Tb) \\=\nabla(Ax)^TAx+(Ax)^T\nabla(Ax)-2A^Tb \\=A^TAx+(Ax)^TA-2A^Tb \\=2A^T(Ax)-2A^Tb.$$