I am working as a novice-developer for company with Deep-learning(DL) Frameworks.
DL is basically consists of several layers of combination of linear and non-linear(usually using ReLU) with millions of yet-determined parameters - then compute the error at the end then re-adjust parameters with given data sets. We call how many layers in which connection - Nerual Net's Architecture, or just simply Architecture.
DL is powerful if I use well-known architecture for well-known task. However, if I face another problem to solve, then I have to contrive new architecture considering computational efficiency and how much complexity this architecture can represent- we just call it representation power.
There are very few journals or articles about which level of complexity we can represent for given architecture of neural net.
I can barely search a few of them, such as Neural networks and rational functions, M. Telgarsky (2017)
If someone ask me what DL is, I always answer simply "function approximation". However, I don't know much about function approximation, even very simple general formulae such as Chebyshev polynomial.
What I'd like to expect with this post is the following:
1) Hope to get some more information/references to answer the question "Which structure of neural architecture is required to solve the given task/problem?"
1-1) To answer this question, we might need to formulate how to define representation power of the model, and which layer can achieve it.
2) Any good reference to study Approximation Theories in terms of Neural Net/DL
I hope this post not be too much under-valued because ML/DL frameworks are developed in engineering sector and primarily based on trial and error research culture, but I expect to motivate more theoretical upgrade of this field by active-participation of well-knowleged mathematicians.
Let me try to (partly) answer your questions.
This is a question that is hard to answer and depends highly on the given task/problem. Currently it is more a trial and error thing then an exact science to say which method/structure/architecture for a neural network works best for which application. The better papers not only implement a (possibly new) method, but also reflect on why these methods seem to be doing better than others. Furthermore, this is not the only thing that impacts the results. It is a combination of network architecture, hyperparamter tuning, loss function choice, data quality, split size of train,test and validation sets and much more. Therefore it is hard to attribute the performance only to the network architecture choice.
But in general: convolutional networks seem to be doing really well for image recognition. Recurrent neural networks also achieve great performance.
This is even more complicated then what I just described, so I will not go into detail here and refer to the previous question.
There are some well known results for softmax activation functions. There are also some theoretical results for ReLU architectures. See here.