Non-linearity of neural network activation function

103 Views Asked by At

I'm relatively new to neural networks, though I have a reasonable background in mathematics and computing.

I understand that, because the composition of linear transformations is linear, a linear activation function can only give rise to a linear neural network.

Intuitively it seems likely that the deviation from linearity from the higher-order terms of the Taylor series of an activation function should compound somehow as we go deeper into the neural network.

Is there a theorem that looks at this? For instance, what is the dependence of the (Taylor series of the) final result on the second-, third-derivative etc. of the activation function (eg of layer $\ell$)?

Could we approximate functions of arbitrary complexity using only quadratic activation functions? (beyond which, why is $\tanh$ seemingly so common and a function like $$ g(z) = \frac{z}{\sqrt{z^2 + 1}}, $$ which has very similar properties, seemingly so rare?)