How to calculate gradient for vector function

2.3k Views Asked by At

I just want to calculate the gradient of $$f(x)=\frac{||Ax-b||_2^2}{c^Tx+d}$$ where $x,b\in R^n,A\in R^{n\times n},c,d\in R$.

I guess the value should be like $$\nabla f(x)=(2A^T(Ax-b)(c^Tx+d)-c||Ax-b||_2^2)/(c^Tx+d)^2$$ By viewing x as scalar and do the calculation, and finally consider about the dimension of each variable. However, I want to know how exactly to do this.

I also notice a lot of rules for scalar has a analogous form for vector also. Is there a formula list for this or a conclusion/theorem about this? It would be awesome if someone can share me this.

Thanks.

2

There are 2 best solutions below

2
On

The derivative, assuming it exists, can always be calculated in the original way: by computing the difference quotient. There's a subtlety here related to whether or not the function is differentiable (e.g. existence of partial derivatives does not quite suffice), but anywhere the function is differentiable you can compute:

$$ Df(x)v = \lim_{h \rightarrow 0} \frac{ f(x + hv) - f(x)}{h}, $$ where $v$ is the direction the derivative is taken in. Therefore, to compute each entry of the matrix form of $Df(x)$, which is often referred to as the gradient, you explicitly compute the directional derivative by taking the limit.

Treating $x$ as a scalar variable, unfortunately, doesn't always work, because in the matrix setting multiplication does not commute.

As an example, I can show you how to differentiate $g(x) = \|Ax - b\|_2^2$. We compute

\begin{align} \frac{1}{h}(g(x + hv) - g(x)) & = \frac{1}{h} \left( hv^T A^T A x + hx^T A^T A v - hv^T A^T b - h b^T A v + o(h) \right) \\ & = v^T A^T A x + x^T A^T A v - v^T A^T b - b^T A v + o(1) \end{align}

Since $g(x)$ is scalar, then each term is equal to its transpose, so this simplifies to $$ 2 x^T A^T Av - 2b^T Av + o(1). $$ This is the derivative of $g$ evaluated at $v$, and hence as a matrix its derivative is $x^T A^T A - 2b^T A$. Sometimes the gradient refers to the transpose of this quantity, but that's a notational nightmare that I hope dies off in the future.

Some important tricks for computing derivatives more quickly can be generalized from the usual tricks for scalar derivatives. For the following I use the following notation:

$$ D[f](x)v $$ Means the derivative (gradient) of $f$ at the point $x$, evaluated on the vector $v$. So there are three variables in the expression.

The Product Rule

Let $f,g$ be differentiable vector functions, and let $B(\cdot, \cdot)$ be a bilinear function. Then $$ D[B(f, g)](x)v = B(D[f](x)v, g(x)) + B(f(x), D[g](x)v). $$

The Quotient Rule

Let $f(x) = x^{-1}$, where $x$ is an element of a unital Banach algebra (more concretely, just take $x$ to be a square invertible matrix). Then

$$ D[f](x)v = -x^{-1} v x^{-1}. $$ Note the order of multiplication here, as matrices (and Banach algebras in general) may not have commutative multiplication.

Chain Rule

Let $f: U \rightarrow V, g: V \rightarrow Z$, where $U \subset X, V \subset Y$, and $X,Y,Z$ are all Banach (vector) spaces. If $f$ is differentiable at $x \in U$ and $g$ is differentiable at $f(x)$, then $g \circ f$ is differentiable at $x$ and

$$ D[g \circ f](x) v = D[g](f(x)) D[f](x)v. $$

Note that $D[f](x)v$ is a vector in $Y$, which works out because $D[g](f(x))$ is a linear map from $Y$ to $Z$.

0
On

First, define some new variables $$\eqalign{ y &= Ax-b &\ \ \ dy = A\,dx \cr T &= y:y &\ \ \ dT = 2y:dy \cr B &= c:x +d &\ \ \ dB = c:dx \cr }$$ where colons denote the Frobenius Inner Product.

Now write the function in terms of these variables, then find its differential and gradient $$\eqalign{ f &= \frac{T}{B} \cr\cr df &= \frac{BdT-TdB}{B^2} \cr &= \frac{B(2y:dy)-T(c:dx)}{B^2} \cr &= \frac{B(2y:A\,dx)-T(c:dx)}{B^2} \cr &= \frac{B(2A^Ty:dx)-T(c:dx)}{B^2} \cr &= \frac{B(2A^Ty)-T(c)}{B^2} \,:dx \cr\cr \frac{\partial f}{\partial x} &= \frac{B(2A^Ty)-T(c)}{B^2} \cr }$$ The Frobenius, Hadamard, Kronecker, and ordinary matrix product follow a very simple rule for the differential of a product $$d(A\star B)=dA\star B+A\star dB$$ Further, the Frobenius and Hadamard products are commutative so terms can be re-arranged and combined much like scalar quantities, e.g. $$d(A\star A)= 2A\star dA$$