Need help in understanding a section of a paper.

52 Views Asked by At

Below is a snapshot of the relevant section in the paper: enter image description here

Why does it lead to "From a geometric perspective, the gradient $\frac{\partial L}{\partial x}$ is the projection of $\frac{\partial L}{\partial \bar{x}}$ onto the tangent space of the unit hyperspace at the normal vector $\bar{x}$." Why and what does this statement mean?

$\bar{x}$ is the normalized vector of x.

The full paper is at https://arxiv.org/pdf/1704.06369.pdf.

1

There are 1 best solutions below

6
On

It's typical nonsense from bad/confusing notation. $L$ is being used to denote two different functions here, one defined on all of space, one defined on the unit sphere. Let's rewrite. Start with $$ L : R^n \to R $$ and define $$ N : R^n \to S^{n-1} : x \mapsto \frac{x}{\|x\|}. $$ Now let $$ H: R^n \to R : x \mapsto L(N(x)). $$

(This is what the paper means by $L(\bar{x})$.)

Now the chain rule says that $$ DH(x) = DL( N(x)) \circ DN(x) $$ (Here I'm using "DH(x)" to denote the derivative transformation for the function $H$ at the point $x$.)

What is $DN(x)$? Well, in the $x$ direction, it's zero, for $N(cx) = N(x)$ for any $c \ne 0$ and $x \ne 0$. In short: $N$ is constant along rays from the origin. If $v$ is a vector orthogonal to $x$, then $DN(x)[v] = v / \|x \|$, i.e., in directions orthogonal to $x$, $DN(x)$ is just a scaling transformation. Summary: if we pick an orthogonal basis for $R^n$ consisting of $$ x, v_1, \ldots, v_{n-1} $$ where all the $v_i$ are, of course, orthogonal to $x$, hence tangent to the unit sphere at $N(x)$, then the matrix of $DN(x)$ in this basis will be $$ \pmatrix{ 0 & 0 & 0 & \ldots \\ 0 & k & 0 & \ldots \\ 0 & 0 & k & \ldots \\ \ldots & \ldots & \ldots & \ldots \\ \ldots & \ldots & 0 & k } $$ where $k = \frac{1}{\|x\|}$.

Now the claim --- that $DH$ is the projection of $DL$ onto the tangent space at $N(x)$ --- is a bit clearer, although perhaps a little misleading. For any vector $v$, $DN(x)[v]$ projects $v$ to be perpendicular to $x$, but also scales it by $1/\|x\|$. Then applying $DL$ to that resulting vector is what's mean by "projecting DL onto the tangent space."

Post-comment addition: $$\newcommand{\bx}{{\mathbf x}}\newcommand{\bv}{{\mathbf v}}$$ Let me try to make clear the description of $DN(\bx)$ by an example in 3-space. I'll use a boldface $bx$ to denote the coordinate triple $(x, y, z)$. I'll assume that all points I mention are away from the origin, so that $\| \bx \| > 0$.

The formula for $N(\bx)$ is $$ N(x, y, z) = \frac{1}{\sqrt{x^2 + y^2 + z^2}}(x, y, z) $$ The first coordinate is $x(x^2 + y^2 + z^2)^{-1/2}$, and the derivatives of that with respect to $x, y,$ and $z$ are $$ \pmatrix{ (x^2 + y^2 + z^2)^{-1/2} - x^2 (x^2 + y^2 + z^2)^{-3/2}\\ -xy(x^2 + y^2 + z^2)^{-3/2}\\ -xz(x^2 + y^2 + z^2)^{-3/2} } = (x^2 + y^2 + z^2)^{-3/2} \pmatrix{ ( y^2 + z^2)\\ -xy\\ -xz }. $$ The other two terms are similar, yielding $$ DN(x, y, z) = (x^2 + y^2 + z^2)^{-3/2} \pmatrix{ ( y^2 + z^2) & -xy & -xz\\ - xy & (x^2 + z^2) & -yz\\ -xz & -yz & (x^2 + y^2) }. $$ Let's take $\bx_0 = (0, 3, 0)$, so $\| \bx_0 \| = 3$. Then $$ DN(\bx_0) = \frac{1}{27} \pmatrix{ 9 & 0 & 0\\ 0 & 0 & 0\\ 0 & 0 & 9 } = \frac{1}{3} \pmatrix{ 1 & 0 & 0\\ 0 & 0 & 0\\ 0 & 0 & 1 }. $$

This is a linear transformation of 3-space. If we apply it to the vector $\bv_0$ that points in the direction of $\bx_0$, i.e., $$ \bv_0 = \pmatrix{0\\ 1\\ 0} $$ we get $$ DN(\bx_0)(\bv_0) = \frac{1}{27} \pmatrix{ 9 & 0 & 0\\ 0 & 18 & 0\\ 0 & 0 & 9 }\pmatrix{0, 1, 0} = \pmatrix{0\\ 0\\ 0}. $$ If we pick two more vectors orthogonal to $\bv_0$, namely $$ \bv_1 = \pmatrix{1\\0\\0}, \bv_2 = \pmatrix{0\\0\\1}, $$ and multiply each by the matrix above, we get $$ DN(\bx_0)[\bv_1] = \frac{1}{3} \bv_1 \\ DN(\bx_0)[\bv_2] = \frac{1}{3} \bv_2, $$ which you can verify by multiplying out things yourself. Thus in the $\bv_0, \bv_1, \bv_2$ basis, the matrix for $DN(\bx_0)$ is $$ \pmatrix{0 & 0 & 0\\ 0 & \frac{1}{3}& 0 \\ 0 & 0 & \frac{1}{3}}, $$ exactly as promised.