Theory of mixed-sigmoid neural networks?

47 Views Asked by At

Assume we have an artificial neural network like this

$${\bf s}_k = f_k({{\bf W}_k{\bf s}}_{k-1})$$

with the outputs after each layer ${\bf s_1},{\bf s}_2,\cdots,{\bf s}_n$ and the input ${\bf s}_0$ and the matrices ${\bf W}_k$ representing the connections between the neurons of layer $k-1$ and $k$. Further assume we want to train it using error back-propagation using the chain rule.

Can we somehow derive the properties of the network based on the properties of the sigmoid functions $f_k$ unique at each layer $k$ (and of course their derivatives ${f_k}'(x)$).

It is very easy to see differences between different configurations, but can we quantify the behaviour somehow? enter image description here Note especially the difference in sharpness of boundaries compared to this one. All other parameters are the same : same number of training samples per batch and same number of training batches and same number of neurons at each pairwise $k$. enter image description here For reference we are using the same training data as in this question.