Understanding why ReLU is discriminatory (proof-understanding)

836 Views Asked by At

I would like to understand why the ReLU activation function, which is often used in machine learning, is discriminatory. For this, I found this reference, where particularly page 11 is relevant, but I do not grasp the proof yet.

(Once we know that ReLU is discriminatory, we can use some results by G. Cybenko (1989), stating that a neural network with a single hidden layer (though with arbitrary width) and a linear output is already a universal approximator.)

For the following definitions, I refer to the above mentioned reference by Guilhoto.

Definition 3.8. Let $n$ be a natural number. We say an activation function $f:\mathbb R\rightarrow \mathbb R$ is $\underline{n\text{-discriminatory}}$ if the only signed Borel measure $\mu$ such that $$\int f\left( y \cdot x + \theta \right) d\mu(x) = 0 \quad \text{for all} \ y\in \mathbb R^n, \text{and} \ \theta\in\mathbb R $$ is the zero measure.

Definition 3.9. We say an activation function $f:\mathbb R\rightarrow \mathbb R$ is $\underline{\text{discriminatory}}$ if it is $\underline{n\text{-discriminatory}}$ for any $n$.

Defintion 3.12. The $\underline{\text{Rectified Linear Unit}}$ (also denoted $\underline{\text{ReLU}}$) is a function $\mathbb R\rightarrow \mathbb R$ defined by: $$ReLU(x) := \max(0, x).$$

Lemma 3.15. The ReLU function is $1$-disciminatory.

What Guilhoto does is to construct a sigmoidal bounded, continuous (and therefore Borel measurable) function $f$ by subtracting two ReLU functions. Since we know that these kinds of functions are disciminatory (cf. Cybenko (1989)), we are done. The concrete function he constructs is $$f: \mathbb R\rightarrow \mathbb R, x\mapsto \begin{cases} 0 \quad \text{if} \ x < 0, \\ x \quad \text{if} \ x\in \left[ 0, 1\right], \\ 1 \quad \text{if} \ x > 1. \end{cases}$$ He goes on that apparently, any function of the form $f\left(yx + \theta\right)$, $y\ne 0$, can be described as a difference of two ReLU functions, namely $$f(yx + \theta) = ReLU\left( yx - \theta/y\right) - ReLU(yx + (1-\theta)/y) \qquad (15)$$

What I did to verify Eq. (15) is to write out all the terms, e.g. $ReLU(yx - \theta/y_1) = \max\left\{0, yx - \theta/y_1\right\}$, $f(yx + \theta) = \begin{cases} 0 \quad \text{if} \ yx + \theta < 0 \\ yx + \theta \quad \text{if} \ yx + \theta\in \left[ 0, 1\right] \\ 1 \quad \ \text{if} \ yx + \theta > 1, \end{cases}$

but I unfortunately do not obtain (15).

1

There are 1 best solutions below

0
On

I'll write $R$ for $ReLU$.

Suppose that $y=2, \theta = 3, x = -1$. Then $f(yx+\theta) = f(1) = 1$, $-\theta/y = -3/2$, $(1-\theta)/y = -1$, $yx = -2$, $R(yx-\theta/y) = R(-2+ 3/2) = R(-1/2)=0$, $R(yx+(1-\theta)/y) = R(-2+-1) = 0$, so $R(yx-\theta/y) - R(yx+(1-\theta)/y) = 0 \neq f(yx + \theta)$. The claimed equality isn't right.

On the other hand, there's a simple expression for $f$ in terms of $R$: $$f(x) = R(x) - R(x-1)$$ because if $x<0$ then the left hand side is 0 and the right is $0-0=0$, if $x > 1$ then the right hand side is $R(x) - R(x-1) = x - (x-1) = 1$, and if $0\leq x \leq 1$ then the right hand side is $x - 0 = x$. Now we can substitute $yx+\theta$ to get

$$g(x) = f(yx+\theta) = R(yx + \theta) - R(yx + \theta - 1).$$