How to compute gradient $ \mathbf{Y} = \mathbf{X} + \mathbf{A} \odot \text{sign}(\mathbf{X}) \qquad s.t. \mathbf{A} \sim U(0,1) $?

70 Views Asked by At

I have an equation

$$ \mathbf{Y} = \mathbf{X} + \mathbf{A} \odot \text{sign}(\mathbf{X}) \qquad s.t. \mathbf{A} \sim U(0,1) $$

Where $\mathbf{Y},\mathbf{X},\mathbf{A} \in R^{m\times n}$, and $U(0,1)$ is the uniform distribution.

My question is how to compute the graident of $\frac{\partial Y}{\partial X}$?

1

There are 1 best solutions below

0
On BEST ANSWER

$ \def\l{\lambda}\def\o{{\tt1}}\def\p{\partial} \def\E{{\cal E}} \def\M{{\mathbb M}} \def\LR#1{\left(#1\right)} \def\BR#1{\Big(#1\Big)} \def\sabs#1{\operatorname{abs}\LR{\l,#1}} \def\ssgn#1{\operatorname{sign}\LR{\l,#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{blue}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\BM{\c{\M}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $Instead of the non-differentiable $\tt{abs()}$ function consider this "soft" version parameterized by $\:0\lt\l\ll\o$ $$\eqalign{ &B = \sabs{X} \doteq \LR{X\odot X + \l J}^{\odot\o/2} \qquad\qquad\quad \\ &B\odot B = X\odot X + \l J \\ &B\odot dB = X\odot dX \\ &dB = S\odot dX \\ }$$ where $J$ is the all-ones matrix and $S$ is the "soft sign" function $$\eqalign{ &S = \ssgn X \doteq X \oslash B \\ &X = B\odot S \\ &B\odot dS = \BR{dX - S\odot dB} = \BR{J - S\odot S}\odot dX \\ &dS = \fracLR{B\odot B-X\odot X}{B\odot B}\odot\frac{dX}{B} = \fracLR{\l J\oslash B}{B\odot B}\odot dX \\ }$$ where $\{\odot,\oslash\}$ denote Hadamard multiplication and division.

These differentials can be converted into gradients with the aid of a 6th-order tensor $(\M)$ $$\eqalign{ &\grad BX = S:\BM, \qquad \grad SX = \fracLR{\l J\oslash B}{X\odot X+\l J}:\BM \qquad\qquad \\ }$$


Note that these "soft" functions are perfectly well-behaved as $X\to 0$ $$\eqalign{ &\lim_{X\to 0} B = {\sqrt\l} J, \qquad \lim_{X\to 0} S = 0 \qquad\qquad\qquad\qquad \\ }$$ and so are their gradients $$\eqalign{ &\lim_{X\to 0} \gradLR BX = 0, \qquad \lim_{X\to 0} \gradLR SX = \frac{J:\BM}{\sqrt\l} \;\doteq\; \frac{\E}{\sqrt\l} \\ }$$ where $\E$ is the 4th-order identity tensor with components $\:\E_{ijkl} = \delta_{ik}\,\delta_{jl}$


The usual "hard" functions are recovered for $\l\to 0$, and as long as no component of $X$ is equal to zero the gradient expressions can still be evaluated $$\eqalign{ &\grad BX = S:\BM, \qquad \grad SX = \fracLR{\l J}{B\odot B\odot B}:\BM = 0 \\ }$$ However, for any $X_{ij}=0\,$ the corresponding components of both gradients and $S_{ij}$ are undefined, although $B_{ij}=0$ remains well-behaved.

It is possible to define $S_{ij}=\o\,$ at $\,X_{ij}=0,\,$ which will allow $S$ and $\gradLR BX$ to be evaluated.


Assuming $X_{ij}\ne 0$, you can apply the above ideas to your function (with $\l=0$) to obtain $$\eqalign{ Y &= X + A\odot S \\ dY &= dX + A\odot 0 \\ \grad YX &= \grad XX \;=\; \E \\ }$$ or you can use "soft" functions $(\l\ne 0)$ and not worry about $X_{ij}=0$ $$\eqalign{ Y &= X + A:\BM:S \\ dY &= dX + A:\BM:\fracLR{\l J}{B\odot B\odot B}:\BM:dX \\ \grad YX &= \E + \fracLR{\l A}{B\odot B\odot B}:\BM \\ }$$