I just want to calculate the gradient of $$f(x)=\frac{||Ax-b||_2^2}{c^Tx+d}$$ where $x,b\in R^n,A\in R^{n\times n},c,d\in R$.
I guess the value should be like $$\nabla f(x)=(2A^T(Ax-b)(c^Tx+d)-c||Ax-b||_2^2)/(c^Tx+d)^2$$ By viewing x as scalar and do the calculation, and finally consider about the dimension of each variable. However, I want to know how exactly to do this.
I also notice a lot of rules for scalar has a analogous form for vector also. Is there a formula list for this or a conclusion/theorem about this? It would be awesome if someone can share me this.
Thanks.
The derivative, assuming it exists, can always be calculated in the original way: by computing the difference quotient. There's a subtlety here related to whether or not the function is differentiable (e.g. existence of partial derivatives does not quite suffice), but anywhere the function is differentiable you can compute:
$$ Df(x)v = \lim_{h \rightarrow 0} \frac{ f(x + hv) - f(x)}{h}, $$ where $v$ is the direction the derivative is taken in. Therefore, to compute each entry of the matrix form of $Df(x)$, which is often referred to as the gradient, you explicitly compute the directional derivative by taking the limit.
Treating $x$ as a scalar variable, unfortunately, doesn't always work, because in the matrix setting multiplication does not commute.
As an example, I can show you how to differentiate $g(x) = \|Ax - b\|_2^2$. We compute
\begin{align} \frac{1}{h}(g(x + hv) - g(x)) & = \frac{1}{h} \left( hv^T A^T A x + hx^T A^T A v - hv^T A^T b - h b^T A v + o(h) \right) \\ & = v^T A^T A x + x^T A^T A v - v^T A^T b - b^T A v + o(1) \end{align}
Since $g(x)$ is scalar, then each term is equal to its transpose, so this simplifies to $$ 2 x^T A^T Av - 2b^T Av + o(1). $$ This is the derivative of $g$ evaluated at $v$, and hence as a matrix its derivative is $x^T A^T A - 2b^T A$. Sometimes the gradient refers to the transpose of this quantity, but that's a notational nightmare that I hope dies off in the future.
Some important tricks for computing derivatives more quickly can be generalized from the usual tricks for scalar derivatives. For the following I use the following notation:
$$ D[f](x)v $$ Means the derivative (gradient) of $f$ at the point $x$, evaluated on the vector $v$. So there are three variables in the expression.
The Product Rule
Let $f,g$ be differentiable vector functions, and let $B(\cdot, \cdot)$ be a bilinear function. Then $$ D[B(f, g)](x)v = B(D[f](x)v, g(x)) + B(f(x), D[g](x)v). $$
The Quotient Rule
Let $f(x) = x^{-1}$, where $x$ is an element of a unital Banach algebra (more concretely, just take $x$ to be a square invertible matrix). Then
$$ D[f](x)v = -x^{-1} v x^{-1}. $$ Note the order of multiplication here, as matrices (and Banach algebras in general) may not have commutative multiplication.
Chain Rule
Let $f: U \rightarrow V, g: V \rightarrow Z$, where $U \subset X, V \subset Y$, and $X,Y,Z$ are all Banach (vector) spaces. If $f$ is differentiable at $x \in U$ and $g$ is differentiable at $f(x)$, then $g \circ f$ is differentiable at $x$ and
$$ D[g \circ f](x) v = D[g](f(x)) D[f](x)v. $$
Note that $D[f](x)v$ is a vector in $Y$, which works out because $D[g](f(x))$ is a linear map from $Y$ to $Z$.