Chain rule for vector calculus

153 Views Asked by At

I am struggling with this optimisation problem where I have my costfunction $J(\mathbf{x}):\mathbb{C}^n\rightarrow\mathbb{C}$ and analytical gradient $\nabla J(\mathbf{x})=J^\prime(\mathbf{x})$ for input vector $\mathbf{x}$, however the input needs to be normalised with $f(\mathbf{x}):\mathbb{C}^n\rightarrow\mathbb{C}^n$ so my new cost function is $J(f(\mathbf{x}))$ with $$ f(\mathbf{x})) = \frac{\mathbf{x}}{\sqrt{\mathbf{x}^H\mathbf{x}}} $$

My question is about the chain rule, $$ \nabla J(f(\mathbf{x})) \stackrel{?}{=} J^\prime(f(\mathbf{x})) f^\prime(\mathbf{x})) $$ is this valid, and what would my $f^\prime(\mathbf{x})) $ look like? Is it a matrix or a vector? $$ f(\mathbf{x})) = \frac{\sqrt{\mathbf{x}^H\mathbf{x}}-\frac{2\mathbf{x}\mathbf{x}^H}{2\sqrt{\mathbf{x}^H\mathbf{x}}}}{\mathbf{x}^H\mathbf{x}} $$

Note $\mathbf{x}^H$ is the Hermitian of $\mathbf{x}$, complex numbers are optional here, but will make my life easier later on.

2

There are 2 best solutions below

0
On BEST ANSWER

Welcome to math.stackexchange.

First, for the chain rule it is better to use the Jacobi matrix instead of the gradient. We write this as $J'(x)$ or $f'(x)$. Then the gradient is the transpose of the Jacobi matrix, so we have to write $\nabla J(x)=J'(x)^H$.

Let us define $g(x)=J(f(x))$ and let $n$ be the length of the vector $x$. Then the chain rule says $$g'(x)=J'(f(x))f'(x),$$ where $x\in\mathbb C^n$, $f'(x)\in\mathbb C^{n\times n}$, and $J'(f(x))\in\mathbb C^{1\times n}$.

For the calculation of $f'(x)$ know that the result has to be a $n\times n$-matrix and that maybe the chain rule might be helpful again. Good luck.

0
On

Here is the case for $x\in{\mathbb R}^{n}$.

First, assign the length of $x$ to its own variable. $$\eqalign{ \lambda^2 &= x^Tx \quad\implies\quad \lambda\,d\lambda = x^Tdx \\ }$$ Write down the explicit formula for normalized vector $(f)$ and calculate its differential. $$\eqalign{ f &= \lambda^{-1}x \\ df &= \lambda^{-1}dx - x\lambda^{-2}d\lambda \\ &= \lambda^{-1}dx - x\lambda^{-3}x^Tdx \\ &= \lambda^{-1}\left(I - ff^T\right)dx \\ &= \left(\frac{I-ff^T}{x^Tx}\right)dx \\ }$$ Use the gradient of the cost function $J$ with respect to $f$ to write its differential, then perform a change of variables to calculate its gradient with respect to $x$. $$\eqalign{ dJ &= \left(\frac{\partial J}{\partial f}\right):df \\ &= \left(\frac{\partial J}{\partial f}\right) : \left(\frac{I-ff^T}{x^Tx}\right)dx \\ &= \left(\frac{I-ff^T}{x^Tx}\right)\left(\frac{\partial J}{\partial f}\right) : dx \\ \frac{\partial J}{\partial x} &= \left(\frac{I-ff^T}{x^Tx}\right)\left(\frac{\partial J}{\partial f}\right) \\ }$$ NB: In some of the steps, a colon is used as a convenient product notation for the trace, i.e. $$\eqalign{ A:B &= \operatorname{Tr}\left(A^TB\right) \\ }$$ In the complex case, you'll need to dive into Wirtinger derivatives and it's probably not worth the headache.

Besides, the cost function must be real, since complex values are not comparable. For example, which complex number is larger $\,(2+3i)$ or $(3-2i)$ ?