Check if the following gradient is correct

299 Views Asked by At

This question regards the verification of the gradient of a given function.

Notation. Let $N, K \in \mathbb{N}_0$ be given (nonzero) integers, with $K > N$. Let $\mathbf{x} = [x_b \ y_b \ z_b]^T \in \mathbb{R}^{3 \times 1}$ be a vector in cartesian coordinates, where $^T$ denotes transpose. Let $\mathbf{1}_K = [1 \ 1 \ \cdots \ 1]^T \in \mathbb{R}^{K \times 1}$ be an all-ones vector of length $K$. Let $\mathbf{y}_i, \mathbf{p}_i \in \mathbb{R}^{K \times 1}$ be given vectors, with $i=1,\dots,N$, and $\mathbf{y}_i \neq \mathbf{p}_i \ \forall i$.

Let $\mathbf{r}_i = [x_i \ y_i \ z_i]^T \in \mathbb{R}^{3 \times 1}$ be given 3-dimensional vectors in a cartesian reference system, with $i=1,\dots,N$, and $\mathbf{r}_i \neq \mathbf{x} \ \forall i$. Let $\alpha \in \mathbb{R}_+$ be a positive scalar and let $\| \cdot \|$ denote the euclidean norm. Finally, define the following scalars: $s_i(\mathbf{x})=\log_{10}\| \mathbf{x} - \mathbf{r}_i\|$, with $i=1,\dots,N$.


Denoting with $\nabla$ the gradient with respect to $\mathbf{x}$, the gradient of the scalar function

$$ f(\mathbf{x}) = \sum_{i=1}^N \| \mathbf{y}_i - \mathbf{p}_i + 10 s_i(\mathbf{x}) \alpha \mathbf{1}_K \|^2 $$

should be given by the following expression

$$ (*) \quad \nabla f(\mathbf{x}) = 2 \frac{10 \alpha}{\ln10} \sum_{i=1}^N \frac{\mathbf{x} - \mathbf{r}_i }{\| \mathbf{x} - \mathbf{r}_i\|^2} \left( \mathbf{1}^T_K(\mathbf{y}_i - \mathbf{p}_i) + K \alpha 10 \log_{10}\| \mathbf{x} - \mathbf{r}_i \| \right) $$

where $\ln$ denotes the natural logarithm (a change of basis was used during the derivation).


Notation (Part 2). The gradient is written in vector notation. This notation is of the following type. Given $\mathbf{x} = [x \ y \ z]^T \in \mathbb{R}^{3 \times 1}$ and a scalar-valued function $f$ (with $f$ differentiable),

$$ \nabla f(\mathbf{x}) = \frac{\partial f(\mathbf{x})}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial f(\mathbf{x})}{\partial x} \\ \frac{\partial f(\mathbf{x})}{\partial y} \\ \frac{\partial f(\mathbf{x})}{\partial z} \end{bmatrix} \in \mathbb{R}^{3 \times 1} $$

with a natural generalization to the $n$-dimensional case, i.e. $\mathbf{x} \in \mathbb{R}^{n \times 1}$. In this notation, the gradient of a (differentiable) vector-valued function $\mathbf{g}(\mathbf{x})$, i.e.,

$$ \mathbf{g}(\mathbf{x}) = \begin{bmatrix} g_1(\mathbf{x}) \\ g_2(\mathbf{x}) \\ \vdots \\ g_n(\mathbf{x}) \end{bmatrix}, \quad \mathbf{x} \in \mathbb{R}^{n \times 1} $$

is written as

$$ \frac{\partial \mathbf{g}^T(\mathbf{x})}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial g_1(\mathbf{x})}{\partial \mathbf{x}} \ \frac{\partial g_2(\mathbf{x})}{\partial \mathbf{x}} \ \cdots \ \frac{\partial g_n(\mathbf{x})}{\partial \mathbf{x}} \end{bmatrix} $$

and must be intended as the transpose of the Jacobian matrix of $\mathbf{g}$. In this notation, letting $A$ be a real symmetric matrix ($A^T = A$), the following property holds

$$ \frac{\partial}{\partial \mathbf{x}} \left( \mathbf{g}^T(\mathbf{x}) \, A \, \mathbf{g}(\mathbf{x}) \right) = 2 \frac{\partial \mathbf{g}^T(\mathbf{x})}{\partial \mathbf{x}} \, A \, \mathbf{g}(\mathbf{x}) $$

which was used in the derivation of the gradient.


Regarding $(*)$, I am not quite sure about the multiplicative constants (outside the summation), but (luckily enough) they are (almost) irrelevant in my context. What really matters is the structure of the gradient and the presence of the scalar product $\mathbf{1}^T_K (\mathbf{y}_i - \mathbf{p}_i)$.

One of the intermediate steps used during the derivation was the following

$$ \nabla \ln \| \mathbf{x} - \mathbf{r}_i \| = \frac{\mathbf{x} - \mathbf{r}_i}{\| \mathbf{x} - \mathbf{r}_i \|^2} $$

A preliminary numerical verification says the gradient is slightly off (up to a factor of $10^{-6}$) its numerical counterpart and this is the reason why I am unsure about the multiplicative constants. Any help is apprecciated.

1

There are 1 best solutions below

2
On BEST ANSWER

Using the property that, for column vector $\mathbf{v}$ the square of the Euclidean norm is equal to the inner product i.e. $\|\mathbf{v}\|^2=\mathbf{v}^T\mathbf{v}$, the scalar function $f(\mathbf{x})$ can be expressed as:- $$f(\mathbf{x})=\sum_{i=1}^N(\mathbf{y}_i-\mathbf{p}_i+10s_i(\mathbf{x})\alpha\mathbf{1}_K)^T(\mathbf{y}_i-\mathbf{p}_i+10s_i(\mathbf{x})\alpha\mathbf{1}_K)$$

Expanding the expression for $f(\mathbf{x})$, noting that for real vectors $\mathbf{a}$ and $\mathbf{b}\in\mathbb{R}^{K\times1}$ the scalar product $\mathbf{a}^T\mathbf{b}$ is equal to $\mathbf{b}^T\mathbf{a}$, and $\mathbf{1}_K^T\mathbf{1}_K=K$ we have $$\begin{split}f(\mathbf{x})&=&\sum_{i=1}^N\left[\mathbf{y}_i^T\mathbf{y}_i-2\mathbf{y}_i^T\mathbf{p}_i+\mathbf{p}_i^T\mathbf{p}_i+2\mathbf{1}_K^T\mathbf{y}_i(10\alpha s_i(\mathbf{x}))-2\mathbf{1}_K^T\mathbf{p}_i(10\alpha s_i(\mathbf{x}))+(10\alpha s_i(\mathbf{x}))^2(\mathbf{1}_K^T\mathbf{1}_K)\right]\\&=&\sum_{i=1}^N\left[\mathbf{y}_i^T\mathbf{y}_i-2\mathbf{y}_i^T\mathbf{p}_i+\mathbf{p}_i^T\mathbf{p}_i+2(10\alpha)(\mathbf{1}_K^T\mathbf{y}_i-\mathbf{1}_K^T\mathbf{p}_i)s_i(\mathbf{x})+(10\alpha s_i(\mathbf{x}))^2K\right]\end{split}$$ We need to differentiate $f(\mathbf{x})$ with respect to $\mathbf{x}$ using the chain rule by differentiating with respect to scalar function $s_i(\mathbf{x})$ and differentiating $s_i(\mathbf{x})$ with respect to $\mathbf{x}$. This results in $$\begin{equation}\begin{split}\nabla f(\mathbf{x})&=&\frac{df(\mathbf{x})}{d\mathbf{x}}=\sum_{i=1}^N\frac{df(\mathbf{x})}{ds_i(\mathbf{x})}\frac{ds_i(\mathbf{x})}{d\mathbf{x}}\\&=&\sum_{i=1}^N\left[2(10\alpha)(\mathbf{1}_K^T\mathbf{y}_i-\mathbf{1}_K^T\mathbf{p}_i)\frac{\partial s_i(\mathbf{x})}{\partial \mathbf{x}}+(10\alpha)^2K(2s_i(\mathbf{x}))\frac{\partial s_i(\mathbf{x})}{\partial \mathbf{x}}\right]\\&=&2(10\alpha)\sum_{i=1}^N\left[\mathbf{1}_K^T(\mathbf{y}_i-\mathbf{p}_i)\frac{\partial s_i(\mathbf{x})}{\partial \mathbf{x}}+(K\alpha10)(s_i(\mathbf{x}))\frac{\partial s_i(\mathbf{x})}{\partial \mathbf{x}}\right]\color{blue}{\text{ Eq.(1)}}\end{split}\end{equation}$$

Now it remains to evaluate $\large \frac{\partial s_i(\mathbf{x})}{\partial \mathbf{x}}$ given that $s_i(\mathbf{x})=\log_{10}\|\mathbf{x}-\mathbf{r}_i\|$.

Using a change of base to the natural logarithm $\ln$, we can express $s_i(\mathbf{x})$ as $$s_i(\mathbf{x})=\frac{\ln \|\mathbf{x}-\mathbf{r}_i\|}{\ln 10}$$ This allows us to evaluate the differential of $s_i(\mathbf{x})$ easily as we can use the identity that, for function $g(x)$, $$\frac{\partial \ln g(x)}{\partial x}=\frac{1}{g(x)}\frac{\partial g(x)}{\partial x}$$ Thus $$\frac{\partial s_i(\mathbf{x})}{\partial \mathbf{x}}=\frac{1}{\ln 10}\frac{\partial \ln\|\mathbf{x}-\mathbf{r}_i\|}{\partial\mathbf{x}}=\frac{1}{\ln 10}\frac{1}{\|\mathbf{x}-\mathbf{r}_i\|}\frac{\partial \|\mathbf{x}-\mathbf{r}_i\|}{\partial \mathbf{x}} \color{blue}{\text{ Eq.(2)}}$$ Using once again the property that the Euclidean norm squared is the inner product of a vector, hence the Euclidean norm is the square root of the inner product, we have via the chain rule, $$\begin{split}\frac{\partial \|\mathbf{x}-\mathbf{r}_i\|}{\partial \mathbf{x}}&=&\frac{\partial [(\mathbf{x}-\mathbf{r}_i)^T(\mathbf{x}-\mathbf{r}_i)]^{1/2}}{\partial \mathbf{x}}\\&=&\frac{\partial [(\mathbf{x}-\mathbf{r}_i)^T(\mathbf{x}-\mathbf{r}_i)]^{1/2}}{\partial [(\mathbf{x}-\mathbf{r}_i)^T(\mathbf{x}-\mathbf{r}_i)]}\frac{\partial [(\mathbf{x}-\mathbf{r}_i)^T(\mathbf{x}-\mathbf{r}_i)]}{\partial \mathbf{x}}\\&=&\frac{\partial [(\mathbf{x}-\mathbf{r}_i)^T(\mathbf{x}-\mathbf{r}_i)]^{1/2}}{\partial [(\mathbf{x}-\mathbf{r}_i)^T(\mathbf{x}-\mathbf{r}_i)]}\frac{\partial (\mathbf{x}^T\mathbf{x}-2\mathbf{r}_i^T\mathbf{x}+\mathbf{r}_i^T\mathbf{r}_i)}{\partial \mathbf{x}}\\&=&\frac{1}{2[(\mathbf{x}-\mathbf{r}_i)^T(\mathbf{x}-\mathbf{r}_i)]^{1/2}}2(\mathbf{x}-\mathbf{r}_i)\\&=&\frac{\mathbf{x}-\mathbf{r}_i}{\|\mathbf{x}-\mathbf{r}_i\|}\end{split}$$ Substituting this expression into $\color{blue}{\text{ Eq.(2)}}$ results in $$\frac{\partial s_i(\mathbf{x})}{\partial \mathbf{x}}=\frac{1}{\ln 10}\frac{\mathbf{x}-\mathbf{r}_i}{\|\mathbf{x}-\mathbf{r}_i\|^2}$$ Having found the partial differential of $s_i(\mathbf{x})$, we can substitute this into $\color{blue}{\text{ Eq.(1)}}$, leading to $$\frac{df(\mathbf{x})}{d\mathbf{x}}=2\frac{10\alpha}{\ln 10}\sum_{i=1}^N\frac{\mathbf{x}-\mathbf{r}_i}{\|\mathbf{x}-\mathbf{r}_i\|^2}(\mathbf{1}_K^T(\mathbf{y}_i-\mathbf{p}_i)+(K\alpha10)\log_{10}\|\mathbf{x}-\mathbf{r}_i\|)$$

Thus the given expression for the gradient is indeed correct.