Is the norm square of a ReLU function differentiable?

131 Views Asked by At

Suppose $f(X)$ is a function from $\mathbb{R}^{m\times n}\mapsto \mathbb{R}$ defined by $f(X)=\lVert\sigma(X)\rVert_F^2$, where $\sigma(X)=\max\{X,0\}$ is an entry-wise ReLU function, i.e., mapping each entry $x_{ij}$ of the matrix $X$ to $\max\{x_{ij},0\}$. How can we determine whether $f(X)$ is a differentiable function with respect to $X$ and if it is differentiable, how to calculate its derivative? or if not differentiable, how to calculate its sub-gradient?

1

There are 1 best solutions below

1
On BEST ANSWER

$ \def\o{{\tt1}} \def\p{\partial} \def\r{\operatorname{\sigma}} \def\s{\operatorname{sign}} \def\t{\operatorname{trace}} $Given the scalar sign() function $$\eqalign{ \s(\lambda) = \begin{cases} +\o\quad{\rm if}\;\lambda\ge 0 \\ -\o\quad{\rm otherwise} \\ \end{cases} }$$ the element-wise sign() function and the all-ones matrix $(J)$ can be used to write an equivalent expression for the element-wise ReLu() function $$\eqalign{ S &= \s(X), \qquad \r(X) &= \frac 12\big(S+J\big) \\ }$$ Use these results to analyze your function $$\eqalign{ f &= \t(\sigma^T\sigma) \\ &= \frac 14\t(S^TS) + \frac 12\t(J^TS) + \frac 14\t(J^TJ) \\ df &= \frac 12\t\!\Big((S+J)^TdS\Big) \\ }$$ So that's the sub-differential of your $f$ function in terms of the sub-differential of the sign() function.

The sign() function is not differentiable at $\lambda=0\,$ and elsewhere its derivative equals zero.