Context: "Squashing Functions and Neural Networks"

154 Views Asked by At

I hope your lives have been proceeding along well.

Lately, I have been reading about squashing functions in the context of neural networks. Specifically, the book I am working through, Deep Learning Architectures by Ovidiu Calin, discusses squashing functions in its section on activation functions, but does not really provide any explanation regarding the significance of these functions, which use cases suit them best, or frankly how they are used in practice (either in implementations or proofs). Does anyone here have some remarks on where exactly squashing functions come into use with neural networks? Perhaps I am not seeing something obvious, but I would really appreciate any help with better situating squashing functions in my mental framework of machine learning. To help with this request, here is their description in the book:

Definition

A function $\varphi:\mathbb{R} \to [0,1]$ is a squashing function if:

  1. It is non-decreasing, i.e., for $x_1, x_2 \in \mathbb{R}$ with $x_1 < x_2$ then $\varphi(x_1) \leq \varphi(x_2)$
  2. $\lim_{n\to -\infty} \varphi(x) = 0$ and $\lim_{n\to \infty} \varphi(x) = 1$

Examples

  • Step function $H(x) = 1_{[0,\infty)}(x)$
  • Ramp function $\varphi(x) = x1_{[0,1]}(x) + 1_{[0,\infty)}(x)$
  • Cosine squasher $\varphi(x) = \frac{1}{2}\bigg(1 + \cos (x + \frac{3\pi}{2}) \bigg) 1_{[-\frac{\pi}{2},\frac{\pi}{2}]}(x) + 1_{(\frac{\pi}{2},\infty)}(x)$