Compute the gradient of a vector valued function $J$ with respect to a scalar

260 Views Asked by At

I'm trying to write a Steepest Descent algorithm in Matlab and I have to solve a line search problem via a Newton's Method subroutine. To make it work, I have to compute the gradient of the following function:

$A \in \mathbb{R}^{n*n},\\ u,f, g \in \mathbb{R}^n,\\ \lambda, h, p\in \mathbb{R}$

The differentiation is with respect to $p$ and the desired result should output scalars

Also, Matlab uses row vectors by default so $u$ would be a row vector and $u^T$ a column vector

$$J(p) = -(u-pg)Ag^T \;-\;\lambda h^2e^{(u-pg)}g^T \; + \; h^2fg^T$$

When I try differentiating with respect to $p$, I immediately get that the last term should vanish, and I believe the first term should become $$gAg^T$$ so that it returns a scalar. Then the trouble for me comes with the middle term. I'm not sure how to differentiate it correctly so that it's compatible with the rest of the solution, since it has to output a scalar....

2

There are 2 best solutions below

0
On BEST ANSWER

One simple approach is just to write out $$ e^{u - pg} g^T = \sum_i e^{u_i - p g_i} g_i. $$ We can now differentiate with respect to $p$ term by term, obtaining $\sum_i e^{u_i - p g_i}(-g_i) g_i $.

0
On

The $\exp$ function can only be applied to a vector in an element-wise manner, so we'll need the rule for the differential of an element-wise function $$\eqalign{ F &= F(x) \cr dF &= F'(x)\circ dx \cr }$$where $\circ$ represents the Hadamard (element-wise) product, and $F'$ is the normal (scalar) derivative which is also applied element-wise.

For typing convenience, define a new vector variable $$y=u-pg$$

Let's use a colon to denote the Frobenius inner product. Then we don't care if we're dealing with column vectors or row vectors or matrices. As long as the objects on each side of the colon have the same shape, we're okay.

Now use the new variable to rewrite that middle term and find its differential and deriviative $$\eqalign{ L &= -\lambda h^2 g:\exp(y) \cr\cr dL &= -\lambda h^2 g:d\exp(y) \cr &= -\lambda h^2 g:\exp(y)\circ dy \cr &= -\lambda h^2 g\circ\exp(y):dy \cr &= +\lambda h^2 g\circ\exp(y):g\,dp \cr\cr \frac{dL}{dp} &= \lambda h^2 g\circ\exp(y):g \cr &= \lambda h^2 g\circ g:\exp(y) \cr &= \lambda h^2 g\circ g\circ\exp(y):1 \cr\cr }$$ You can rearrange the derivative into many different forms, using the fact that the Frobenius and Hadamard products are mutually commutative. And that a matrix (or vector) of all ones is the identity element for Hadamard multiplication.

If you wish to eliminate the Frobenius product, you can use its trace-equivalent $$A:B=\operatorname{tr}(A^TB)$$